Mass import into jbovlaste ?

18 views
Skip to first unread message

najrut

unread,
Oct 10, 2011, 1:18:31 PM10/10/11
to loj...@googlegroups.com
Revising Goldendict dictionaries I started wondering whether I could merge some of them.
But the main Goldendict dictionary is regularly being updated from jbovlaste database.

So is there any possibility of importing Leo Molas's names of countries and languages (like gugde'usu) at once into jbovlaste ?
In my turn I can prepare a file ready for such import.
Just tell me it's format and I'll send it to you or apply changes myself (which I'm obviously afraid of).   

.alyn.post.

unread,
Oct 10, 2011, 1:38:17 PM10/10/11
to loj...@googlegroups.com
FWIW, here is the jbovlaste source code:

https://github.com/lojban/jbovlaste

By poking around in there, you might be able to determine whether
mass import is supported or what you'd have to do to make that
happen.

I'm given to understand that this tool is still unmaintained, so
I don't think there is anyone around to answer questions about it,
the source code has to stand on it's own.

-Alan

> --
> You received this message because you are subscribed to the Google Groups
> "lojban" group.
> To view this discussion on the web visit
> [1]https://groups.google.com/d/msg/lojban/-/Ts3o9-5H9o8J.
> To post to this group, send email to loj...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban?hl=en.
>
> References
>
> Visible links
> 1. https://groups.google.com/d/msg/lojban/-/Ts3o9-5H9o8J

--
.i ma'a lo bradi cu penmi gi'e du

Robin Lee Powell

unread,
Oct 10, 2011, 4:04:16 PM10/10/11
to loj...@googlegroups.com
It's lightly maintained by me, I just don't like doing it.

https://github.com/lojban/jbovlaste/tree/master/bin -- the "snarf"
scripts are the ones that were used for automated import. If you
can update one of those, great. If you can simply get things in the
same format as the main gismu list, I can probably make it work.

If there's someone around who isn't offended by Perl + Mason and
wants to help with jbovlaste, that'd be awesome.

Way *way* better, though, would be someone who wants to work on
its successor; we have various ideas floating around about how that
would look. It would be very interesting, hard work. We don't care
much what language you do it in, although selecting something that
other Lojbanists know and can tolerate would be good, and I have
some sense of what those are if people actually are interested.

-Robin

Jonathan Jones

unread,
Oct 10, 2011, 5:01:50 PM10/10/11
to loj...@googlegroups.com
I'm interested, but I don't yet have the necessary skill. Give me a couple more years of college and I might, then I'm more than happy to volunteer.
--
mu'o mi'e .aionys.

.i.e'ucai ko cmima lo pilno be denpa bu .i doi.luk. mi patfu do zo'o
(Come to the Dot Side! Luke, I am your father. :D )

Luke Bergen

unread,
Oct 10, 2011, 7:25:21 PM10/10/11
to loj...@googlegroups.com
Sounds like a fun project for a rails app.  Are there any design documents/specifications or anything?

Looking through jbovlaste I see a lot of things like "in the sense" to describe an english translation of a lojban word.  Are there any other gotchas/things to keep in mind for someone who was designing a jbovlaste 2.0?

Robin Lee Powell

unread,
Oct 10, 2011, 10:29:05 PM10/10/11
to loj...@googlegroups.com
What I have in my head is very much not like the current version.
I have a general feeling that the way we structure definitions isn't
actually good for how Lojban words actually work, and I want to make
something that can be used for a formal dictionary, or flashcards,
or glossing.

The only thing I have is a document Tene and I worked on,
https://docs.google.com/document/d/1U6Q9u4_ZwqZzyy7MW-1dfpkx4oxlc0seF82zp2FlqzE/edit?hl=en_US

I would be more than happy to meet on skype or something and flesh
out the concepts (i.e. to work together to produce a more sane
document).

-Robin

najrut

unread,
Oct 11, 2011, 10:38:04 AM10/11/11
to loj...@googlegroups.com
>>If you can simply get things in the same format as the main gismu list, I can probably make it work.
Sounds promising. What is the format of gismu list ?

.alyn.post.

unread,
Oct 11, 2011, 10:56:56 AM10/11/11
to loj...@googlegroups.com
Attached, find a PEG grammar for the gismu list. It may or may not
be useful...

-Alan

On Tue, Oct 11, 2011 at 07:38:04AM -0700, najrut wrote:
> >>If you can simply get things in the same format as the main gismu list,
> I can probably make it work.
> Sounds promising. What is the format of gismu list ?
> Do you

> mean [1]https://github.com/lojban/jbovlaste/blob/master/bin/gismu.txt ?


>
> --
> You received this message because you are subscribed to the Google Groups
> "lojban" group.
> To view this discussion on the web visit

> [2]https://groups.google.com/d/msg/lojban/-/tnxpu3IixlgJ.


> To post to this group, send email to loj...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban?hl=en.
>
> References
>
> Visible links

> 1. https://github.com/lojban/jbovlaste/blob/master/bin/gismu.txt
> 2. https://groups.google.com/d/msg/lojban/-/tnxpu3IixlgJ

gismu-list.peg

Robin Lee Powell

unread,
Oct 11, 2011, 4:41:50 PM10/11/11
to loj...@googlegroups.com
On Tue, Oct 11, 2011 at 07:38:04AM -0700, najrut wrote:

That's the one.

Or just tab-seperated with obvious fields. Parsing such a thing is
trivial next to the code for the actual import, which is already
basically written.

-Robin

--
http://singinst.org/ : Our last, best hope for a fantastic future.
Lojban (http://www.lojban.org/): The language in which "this parrot
is dead" is "ti poi spitaki cu morsi", but "this sentence is false"
is "na nei". My personal page: http://www.digitalkingdom.org/rlp/

najrut

unread,
Oct 12, 2011, 10:57:32 AM10/12/11
to loj...@googlegroups.com
Then how can I help ?
These lists are already tab-separated if you download them in text format
Glosswords and definitions are there.
Can you please do it yourself or do I need to prepare them in a somewhat different way ?

najrut

unread,
Oct 14, 2011, 11:16:33 AM10/14/11
to loj...@googlegroups.com
So is there still any obstacles ? Should I post tab-separated text files here ?
They can have the following format.
valsi   TAB     translation    TAB     glossword    TAB     in the sense

Will this suit you ?

Robin Lee Powell

unread,
Oct 15, 2011, 2:55:26 AM10/15/11
to loj...@googlegroups.com
The only real obstacle is me finding/taking the time.

-Robin

> --
> You received this message because you are subscribed to the Google Groups "lojban" group.

> To view this discussion on the web visit https://groups.google.com/d/msg/lojban/-/n4KBQiueOZYJ.


> To post to this group, send email to loj...@googlegroups.com.
> To unsubscribe from this group, send email to lojban+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/lojban?hl=en.
>

--

najrut

unread,
Oct 15, 2011, 4:18:44 AM10/15/11
to loj...@googlegroups.com
One more request.
If you find time for this can you please import them so that they all have high rating (like 10 000 votes or so).
They were all made by an algorithm so in my opinion the voting system is inappropriate here. It's like the official data. 

Jonathan Jones

unread,
Oct 15, 2011, 4:29:06 AM10/15/11
to loj...@googlegroups.com
I disagree with giving the ISO generated words a high rating. I consider them nothing more than a useful backup for when a person doesn't have a "real" word. I do not think they are appropriate for official designation.

This, by the way, is why the voting system is there- so that those of us who care can vote for against a certain word for a certain meaning.

--
You received this message because you are subscribed to the Google Groups "lojban" group.
To view this discussion on the web visit https://groups.google.com/d/msg/lojban/-/FCF8Vvcl7a4J.

To post to this group, send email to loj...@googlegroups.com.
To unsubscribe from this group, send email to lojban+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.



--

najrut

unread,
Oct 15, 2011, 5:08:05 AM10/15/11
to loj...@googlegroups.com
Then something must be done about changing the rating of all of them at once. As they all definitely act like one word or like "la'oi US cu gugde" for gugde'usu.

.alyn.post.

unread,
Oct 15, 2011, 12:03:33 PM10/15/11
to loj...@googlegroups.com
I don't understand the problem with applying the regular voting
mechanism to this work, will you explain that?

Why is it that these words deserve special consideration?

-Alan

> --
> You received this message because you are subscribed to the Google Groups
> "lojban" group.
> To view this discussion on the web visit

> [1]https://groups.google.com/d/msg/lojban/-/Cq1S_YJm5GEJ.


> To post to this group, send email to loj...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban?hl=en.
>

> References
>
> Visible links
> 1. https://groups.google.com/d/msg/lojban/-/Cq1S_YJm5GEJ

najrut

unread,
Oct 15, 2011, 12:38:21 PM10/15/11
to loj...@googlegroups.com
They were all created by one algorithm. They all reflect just one table of ISO codes.
If we reject this table then we reject all those valsi. And vice versa.
If gugdefuru has the right to exist why gugde'usu cannot ?  And vice versa.
I consider all those words as one large regular table.

najrut

unread,
Oct 15, 2011, 12:40:14 PM10/15/11
to loj...@googlegroups.com
Also if you vote up one word from that list all the others should be voted up automatically. They are all equal. None of them is better than any other. 

.alyn.post.

unread,
Oct 15, 2011, 1:11:40 PM10/15/11
to loj...@googlegroups.com
I understand that some of these words have gismu that mean
(essentially) the same thing, like brito. Yet the gismu
were created with one algorithm as well. Will you help me
understand how your reasoning applies here?

If I reject brito must I reject all of the gismu? Conversely,
if I reject whatever word means roughly the same thing as brito
in this word set, must I reject all of the words in this word
set?

I don't understand how your argument for regularity applies for
other "regular" sets of words, particularly when there is an
apparent conflict.

To be clear, I'm curious about your request to have all of these
words treated as a unit/special case, not about the words
themselves. (That conversation was had creating them.) In the far
future, if, say, we discover that gugde'usu is a very, very terrible
word (let's pretend that in that country's language it sounds
uncomfortably similar to baby raping), do we retroactively rescind
all of these words? Do we tell users of that word that this sound
does not mean what they think it means? Do we let the future deal
with the future's problems? Is "being created by one algorithm" a
sufficient reason to create a block of words? If so, is it a
sufficient reason to rescind/remove/destroy a block of words? Are
you willing to stand behind the removal of all of these words for
the same reason you're stating we should stand behind creating
them?

-Alan

> --
> You received this message because you are subscribed to the Google Groups
> "lojban" group.
> To view this discussion on the web visit

> [1]https://groups.google.com/d/msg/lojban/-/d3L9tEH84CQJ.


> To post to this group, send email to loj...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban?hl=en.
>
> References
>
> Visible links

> 1. https://groups.google.com/d/msg/lojban/-/d3L9tEH84CQJ

.alyn.post.

unread,
Oct 15, 2011, 1:13:18 PM10/15/11
to loj...@googlegroups.com

I need to request again that you preserve the threading of e-mail
messages when you reply to them. It is not clear at all to me
which message you reply to, when I read only your reply. I cannot
tell what this e-mail is in reply to.

Will you include the message you are replying to in your reply? I
will not be able to continue a conversation with you if I am
confused by the ordering of your messages.

-Alan

Jonathan Jones

unread,
Oct 15, 2011, 3:00:08 PM10/15/11
to loj...@googlegroups.com
While I disagree with najrut, I have to say that I really don't think it matters. I think I know Robin well enough to know it's not going to happen.

Robin hasn't even promised to upload the words. All he's said is basically, when he has the time, he might do it.

Doing anything beyond merely uploading them, any kind of special handling, would require even more time and effort.

Messing with jbovlaste in any way is very low on Robin's priority list, especially considering his desire for jbovlaste 2.0 to be created to replace it.

Conclusion: Robin might, /might/, at some time in the future upload all the ISO words to jbovlaste. But nothing more. But more likely, najrut will have to put them on there himself if he wants them there this year.




As to the words themselves, my reason for not wanting them is because when it comes to all or nothing situations, I prefer nothing. The conversation that spawned those two huge lists was about the cultural gismu, and the fact that some cultures had a gismu, whereas others did not, and that this was an example of Lojban /not/ being culturally neutral. It was decided that the only way to be culturally neutral was either for all cultures to have there own words, which resulted in a huge list of words based on ISO codes of counrties and languages, or for none to have their own word, which resulted in the creation of this list:

banra'a
klura'a
cemra'a
jdara'a
selgu'era'a
tutra'a

For example, {lu lo glibau to mintu la'oi English toi banra'a la'oi U.K. li'u mintu lu lo glibau cu brito lo bangu li'u .e zoi gy. English is the language of the U.K. .gy}.

Personally, I would much rather remember 6 words that can be used for /any/ cultural reference, than a huge slew of words that can each only be used for one.

--
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to loj...@googlegroups.com.
To unsubscribe from this group, send email to lojban+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.

Luke Bergen

unread,
Oct 15, 2011, 3:53:51 PM10/15/11
to loj...@googlegroups.com

By the way, I'd like to address that point.  If one culture is objectively more influential on the average universe of discourse, why should it not have it's own gismu even if other cultures do not.  Apple is a common enough thing that we have a gismu for it.  Kumquat is obscure or not often referenced enough such that we say "make a lujvo/fu'ivla for it".  This is not lojban being "biased" towards apples or against kumquats.  It is lojban accurately reflecting the needs of its users.  And if you're a kumquat lover or Nigerian and are offended, I appologize.

Pierre Abbat

unread,
Oct 15, 2011, 4:00:42 PM10/15/11
to loj...@googlegroups.com
On Saturday 15 October 2011 04:29:06 Jonathan Jones wrote:
> I disagree with giving the ISO generated words a high rating. I consider
> them nothing more than a useful backup for when a person doesn't have a
> "real" word. I do not think they are appropriate for official designation.

I think that, if they are entered, they should be entered with no votes. They
begin with a rafsi, but are not valid type-3 fu'ivla. And I agree that
they're nothing more than a useful backup. I'd much rather have each country
or language name individually crafted than all turned out by that algorithm.

Pierre
--
When a barnacle settles down, its brain disintegrates.
Já não percebe nada, já não percebe nada.

.alyn.post.

unread,
Oct 15, 2011, 4:59:25 PM10/15/11
to loj...@googlegroups.com
.i zoi .url. http://qkme.me/356mp7 .url. mu'o mi'e .alyn.

On Sat, Oct 15, 2011 at 01:00:08PM -0600, Jonathan Jones wrote:
> While I disagree with najrut, I have to say that I really don't think it
> matters. I think I know Robin well enough to know it's not going to
> happen.
>
> Robin hasn't even promised to upload the words. All he's said is
> basically, when he has the time, he might do it.
>
> Doing anything beyond merely uploading them, any kind of special handling,
> would require even more time and effort.
>
> Messing with jbovlaste in any way is very low on Robin's priority list,
> especially considering his desire for jbovlaste 2.0 to be created to
> replace it.
>
> Conclusion: Robin might, /might/, at some time in the future upload all
> the ISO words to jbovlaste. But nothing more. But more likely, najrut will
> have to put them on there himself if he wants them there this year.
>
> As to the words themselves, my reason for not wanting them is because when
> it comes to all or nothing situations, I prefer nothing. The conversation
> that spawned those two huge lists was about the cultural gismu, and the
> fact that some cultures had a gismu, whereas others did not, and that this
> was an example of Lojban /not/ being culturally neutral. It was decided
> that the only way to be culturally neutral was either for all cultures to
> have there own words, which resulted in a huge list of words based on ISO
> codes of counrties and languages, or for none to have their own word,
> which resulted in the creation of this list:
>

> [1]banra'a
> [2]klura'a
> [3]cemra'a
> [4]jdara'a
> [5]selgu'era'a
> [6]tutra'a


>
> For example, {lu lo glibau to mintu la'oi English toi banra'a la'oi U.K.
> li'u mintu lu lo glibau cu brito lo bangu li'u .e zoi gy. English is the
> language of the U.K. .gy}.
>
> Personally, I would much rather remember 6 words that can be used for
> /any/ cultural reference, than a huge slew of words that can each only be
> used for one.
>
> On Sat, Oct 15, 2011 at 11:13 AM, .alyn.post.
> <[7]alyn...@lodockikumazvati.org> wrote:
>
> On Sat, Oct 15, 2011 at 09:40:14AM -0700, najrut wrote:

> > * *Also if you vote up one word from that list all the others should
> be voted
> > * *up automatically. They are all equal. None of them is better than
> any
> > * *other.


> >
>
> I need to request again that you preserve the threading of e-mail

> messages when you reply to them. *It is not clear at all to me
> which message you reply to, when I read only your reply. *I cannot


> tell what this e-mail is in reply to.
>

> Will you include the message you are replying to in your reply? *I


> will not be able to continue a conversation with you if I am
> confused by the ordering of your messages.
>
> -Alan
> --
> .i ma'a lo bradi cu penmi gi'e du
>
> --
> You received this message because you are subscribed to the Google
> Groups "lojban" group.

> To post to this group, send email to [8]loj...@googlegroups.com.


> To unsubscribe from this group, send email to

> [9]lojban+un...@googlegroups.com.


> For more options, visit this group at

> [10]http://groups.google.com/group/lojban?hl=en.


>
> --
> mu'o mi'e .aionys.
>
> .i.e'ucai ko cmima lo pilno be denpa bu .i doi.luk. mi patfu do zo'o
> (Come to the Dot Side! Luke, I am your father. :D )
>
> --
> You received this message because you are subscribed to the Google Groups
> "lojban" group.
> To post to this group, send email to loj...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban?hl=en.
>

> References
>
> Visible links
> 1. http://jbovlaste.lojban.org/dict/banra%27a
> 2. http://jbovlaste.lojban.org/dict/klura%27a
> 3. http://jbovlaste.lojban.org/dict/cemra%27a
> 4. http://jbovlaste.lojban.org/dict/jdara%27a
> 5. http://jbovlaste.lojban.org/dict/selgu%27era%27a
> 6. http://jbovlaste.lojban.org/dict/tutra%27a
> 7. mailto:alyn...@lodockikumazvati.org
> 8. mailto:loj...@googlegroups.com
> 9. mailto:lojban%2Bunsu...@googlegroups.com
> 10. http://groups.google.com/group/lojban?hl=en

najrut

unread,
Oct 16, 2011, 3:32:54 AM10/16/11
to loj...@googlegroups.com
Thanks all of you for the explanation. Now I think that it's even better to have those lists in separate dictionaries. No more questions about this import.


On Saturday, October 15, 2011 11:00:08 PM UTC+4, aionys wrote:
While I disagree with najrut, I have to say that I really don't think it matters. I think I know Robin well enough to know it's not going to happen.

Robin hasn't even promised to upload the words. All he's said is basically, when he has the time, he might do it.

Doing anything beyond merely uploading them, any kind of special handling, would require even more time and effort.

Messing with jbovlaste in any way is very low on Robin's priority list, especially considering his desire for jbovlaste 2.0 to be created to replace it.

Conclusion: Robin might, /might/, at some time in the future upload all the ISO words to jbovlaste. But nothing more. But more likely, najrut will have to put them on there himself if he wants them there this year.




As to the words themselves, my reason for not wanting them is because when it comes to all or nothing situations, I prefer nothing. The conversation that spawned those two huge lists was about the cultural gismu, and the fact that some cultures had a gismu, whereas others did not, and that this was an example of Lojban /not/ being culturally neutral. It was decided that the only way to be culturally neutral was either for all cultures to have there own words, which resulted in a huge list of words based on ISO codes of counrties and languages, or for none to have their own word, which resulted in the creation of this list:

banra'a
klura'a
cemra'a
jdara'a
selgu'era'a
tutra'a

For example, {lu lo glibau to mintu la'oi English toi banra'a la'oi U.K. li'u mintu lu lo glibau cu brito lo bangu li'u .e zoi gy. English is the language of the U.K. .gy}.

Personally, I would much rather remember 6 words that can be used for /any/ cultural reference, than a huge slew of words that can each only be used for one.

Me too

Jonathan Jones

unread,
Oct 16, 2011, 5:32:45 AM10/16/11
to loj...@googlegroups.com
On Sun, Oct 16, 2011 at 1:32 AM, najrut <ruler...@gmail.com> wrote:
Thanks all of you for the explanation. Now I think that it's even better to have those lists in separate dictionaries. No more questions about this import.


On Saturday, October 15, 2011 11:00:08 PM UTC+4, aionys wrote:
...

banra'a
klura'a
cemra'a
jdara'a
selgu'era'a
tutra'a

For example, {lu lo glibau to mintu la'oi English toi banra'a la'oi U.K. li'u mintu lu lo glibau cu brito lo bangu li'u .e zoi gy. English is the language of the U.K. .gy}.

Personally, I would much rather remember 6 words that can be used for /any/ cultural reference, than a huge slew of words that can each only be used for one.

Me too

Then you go to those pages and up-vote those words. :)

Jonathan Jones

unread,
Oct 16, 2011, 5:32:58 AM10/16/11
to loj...@googlegroups.com
*should

najrut

unread,
Oct 16, 2011, 5:40:47 AM10/16/11
to loj...@googlegroups.com
vi'o u'i
Reply all
Reply to author
Forward
0 new messages