Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

FOSS tools for lexicon generation?

20 views
Skip to first unread message

martinobal

unread,
Mar 7, 2008, 2:29:26 PM3/7/08
to
Hi!

I'm aware that this topic has been dicussed before, but I couldn't
find exactly what I'm looking for in the archives.

There are several tools for maintaining a lexicon and generating
inflexions according to rules, and then there's software like
Langmaker, which besides being non-free and unmaintained, is quite
similar to what I want, but I don't think it's quite that. I'm judging
from what I've heard and some web apps that supposedly do something
similar. They just give you a random list of words, but not the full
list of possible words reponding to the regular expression or BNF
introduced.

So, what I need is a program/script which can:

1) Generate the FULL list of possible words according to a user-
defined specification, like a EBNF or regular expressions. For
instance, "Word := C V C | V C" with additional restrictions, like
"the first letter can't be "h" "
2) Let me define another list of "dummy names" for the words in the
lexicon, so that I can build example sentences like "#article-1 #noun-
dog #verb-barks" without using actual words of the conlang, which may
not even be assigned yet.
3) Let me build a third list of translations to other language or
languages.
4) Let me assign conlang words to their dummy names, and dummy names
to their translations
5) Tell me at every time which words are still available (unassigned
to dummy names) and vice versa, which dummy names have yet no
translation, and vice versa, which translations have no conlang word
and viceversa.
6)Output the results in a suitable table format (word- dummy name -
translation) to be inserted in text or HTML documents.

Most of this can be done with a database, like kexi (similar to MS
Access) or OOBase. There are even more flexible options, like
powerloom. The problem I have is how to interact with the database for
automated lexicon generation (point 1). So, maybe if a script can
generate a CSV text file of words from a BNF description, that could
be imported to the database.

All ideas are welcome :)

Regards,

Anonymous

unread,
Mar 7, 2008, 2:46:12 PM3/7/08
to
> All ideas are welcome :)

If I needed to generate a CSV list of words according to a formula, I
would write a BASIC program to do it, probably using the freeware
Chipmunk BASIC interpreter.

I realize BASIC is extremely unfashionable -- nobody will be impressed
if you admit to using it -- but it's free, it's similar to English
(thus easily learned and user-friendly), and it has a lot of
text-handling functions.

You could do it with C or Java too, I suspect.

--
-30-

Rick Harrison

unread,
Mar 7, 2008, 3:16:31 PM3/7/08
to
Below is a program I whipped up to show how you can generate a list of
all possible syllables using BASIC. It took about 8 minutes to write
this, which shows how easy BASIC is to use. Feel free to modify this
and use it however you wish. Obvioulsy doing all possible _words_ would
be more complex than doing all possible syllables, but you have to
start somewhere...

REM * this program generates all possible syllables *
REM * first we create arrays to hold our strings *
DIM a$(50)
DIM b$(50)
DIM c$(50)

REM * put permitted initial consonants into a$ array *
Initials:
READ y$
IF y$ = "xxx" THEN GOTO Vowels
counta = counta + 1
a$(counta) = y$
GOTO Initials

REM * put permitted vowels into b$ array *
Vowels:
READ y$
IF y$ = "xxx" THEN GOTO Finals
countb = countb + 1
b$(countb) = y$
GOTO Vowels

REM * put permitted finals into b$ array *
Finals:
READ y$
IF y$ = "xxx" THEN GOTO Wrap
countc = countc + 1
c$(countc) = y$
GOTO Finals

REM * create the syllables and output them to a textfile *
Wrap:
OPEN "syllables.txt" FOR OUTPUT AS #1
FOR i = 1 TO counta
FOR j = 1 TO countb
FOR k = 1 TO countc
sa$ = a$(i)
sb$ = b$(j)
sc$ = c$(k)
IF sc$ = "zero" THEN sc$ = ""
syllable$ = sa$ + sb$ + sc$
PRINT #1, syllable$
NEXT k
NEXT j
NEXT i
CLOSE #1
PRINT "Finished. It was a pleasure to serve you."
END

REM * below is where the intials, medials, and finals are stored *
REM * xxx marks the end of each list *
DATA b, ch, d, xxx
DATA a, i, u, xxx
DATA zero, n, s, xxx

- - - - - that's the end of the program - - - - -

- - - - - below is what the output file looks like - - - - -

ba
ban
bas
bi
bin
bis
bu
bun
bus
cha
chan
chas
chi
chin
chis
chu
chun
chus
da
dan
das
di
din
dis
du
dun
dus

martinobal

unread,
Mar 8, 2008, 1:05:22 PM3/8/08
to
On Mar 7, 9:16 pm, Rick Harrison <n...@alone.com> wrote:
> Below is a program I whipped up to show how you can generate a list of
> all possible syllables using BASIC. It took about 8 minutes to write
> this, which shows how easy BASIC is to use. Feel free to modify this
> and use it however you wish. Obvioulsy doing all possible _words_ would
> be more complex than doing all possible syllables, but you have to
> start somewhere...

Thanks, Rick and Anonymous, for your prompt reply and tips!
This script looks interesting. I'm a bit more familiar with java than
with BASIC, but I've found your script very useful as a guide. Now I
have to find out how to combine this with a DB like kexi or OOBase, or
some other tool, to get those features I want. On the other hand, It
would be nice to do this with some linguistic IDE, like GATE.

Regards,

John W. Kennedy

unread,
Mar 9, 2008, 12:55:15 AM3/9/08
to
**************Ruby***************
# this program generates all possible syllables
initials = %w/b ch d/
medials = %w/a i u/
finals = [''] + %w/n s/
open('syllables.txt', 'w') do |out|
initials.each do |initial|
medials.each do |medial|
finals.each do |final|
out.puts initial + medial + final
end
end
end
end
puts 'Finished. It was a pleasure to serve you.'

**************Java***************
// this program generates all possible syllables
import java.io.PrintWriter;
import java.io.IOException;
public final class Syllables {
private static final String[] initials = {"b", "ch", "d"};
private static final String[] medials = {"a", "i", "u"};
private static final String[] finals = {"", "n", "s"};
public static void main(final String[] args) throws IOException {
final PrintWriter out = new PrintWriter("syllables.txt");
for (final String anInitial : initials)
for (final String aMedial : medials)
for (final String aFinal : finals)
out.println(anInitial + aMedial + aFinal);
out.close();
System.out.println("Finished. It was a pleasure to serve you.");
}
}


--
John W. Kennedy
"Only an idiot fights a war on two fronts. Only the heir to the throne
of the kingdom of idiots would fight a war on twelve fronts"
-- J. Michael Straczynski. "Babylon 5", "Ceremonies of Light and Dark"

Rick Harrison

unread,
Mar 9, 2008, 10:56:31 AM3/9/08
to
In article <47d37bc5$0$25039$607e...@cv.net>, John W. Kennedy
<jwk...@attglobal.net> wrote:

> **************Ruby***************
> # this program generates all possible syllables
> initials = %w/b ch d/
> medials = %w/a i u/

(etc)

Cool! Those are elegant translations.

martinobal

unread,
Mar 9, 2008, 10:05:04 PM3/9/08
to
On Mar 9, 6:55 am, "John W. Kennedy" <jwke...@attglobal.net> wrote:
> **************Ruby***************
> # this program generates all possible syllables
> initials = %w/b ch d/
> medials = %w/a i u/
> finals = [''] + %w/n s/
(...)

I've just tried the ruby version with kdevelop and it works great,
thanks! :)

Rick Harrison

unread,
Apr 9, 2008, 5:14:05 AM4/9/08
to

I'm not sure how many people saw your message. Some servers filter out
binary attachments from text-oriented newsgroups.

lenadi_moucina

unread,
May 15, 2008, 1:56:11 PM5/15/08
to
Hello!
This is my first message in this group.

For generate all syllables, I wrote a small program Javascript,
embesed in a html page .
I try to attach this code here, but it is possible that the system
pull it out (web code in web code !)
******************************************************************************************************************

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
<html>
<head>
<title>SILABUS 0.0</title>
<meta name="autoro" content="Daniel Macouin">

<script language="JavaScript">
<!-- JavaScript
function silabus(){

l1 = document.form.L1.value
liste1 = l1.split(",");

l2 = document.form.L2.value
liste2 = l2.split(",");

l3 = document.form.L3.value ;
liste3 = l3.split(",");

silabaire = new Array() ;

// boucles!!
n = 0
for (v=0;v<liste2.length;v++){
for (i=0;i<liste1.length; i++){
for (j=0;j<liste3.length;j++){
silabaire[n]= liste1[i]+liste2[v]+liste3[j];
n++
}
}
}
resultSilab = ""
for (l=0;l<silabaire.length;l++){
resultSilab = resultSilab + "," +silabaire[l] ;
}
//afficheSilabaire = resultSilab.join(",")
document.form.silabaire.value = resultSilab ;
}
// - JavaScript - -->
</script></head>

<body bgcolor="#CCFFCC" text="black" link="blue" vlink="purple"
alink="red">

<div align="center"><table border="0">
<tr>
<td colspan="3" bgcolor="maroon"><h1 align="center"><span
style="background-color:maroon;"><font
color="olive">SILABUS
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</
font></span></h1></td>
<td bgcolor="maroon"><p align="center"><font
color="#906D65">Lenadi
MOUCINA .2008. libera programo</font></td>
</tr>
<tr>
<td width="883" colspan="3"><div align="center"><table
border="10" cellpadding="15"
cellspacing="0" width="75%" bgcolor="#7FA256"
style="border-width:8; border-style:outset;">
<tr>
<td><form name="form" method="get">
<h1 align="center"><span style="background-
color:maroon;"><font
color="yellow">&nbsp;&nbsp;1 &nbsp;</font></
span><font
color="yellow">&nbsp;<input type="text"
name="L1" value="x,xy,gh,rt"
size="30"></font></h1>
<h1 align="center"><span style="background-
color:maroon;"><font
color="yellow">&nbsp;&nbsp;2&nbsp;&nbsp;</
font></span><font
color="yellow">&nbsp;<input type="text"
name="L2" value="a,i"
size="30"></font></h1>
<h1 align="center"><span style="background-
color:maroon;"><font
color="yellow">&nbsp;&nbsp;3 &nbsp;</font></
span><font
color="yellow">&nbsp;<input type="text"
name="L3" value="k,t,rt"
size="30"></font></h1>
<h1 align="center"><font
color="yellow">&nbsp;</font><input
type="button" name="envoi" value="......
&gt;&gt;&gt;&gt;&gt;"
onclick="silabus() ;" style="font-
style:normal; font-weight:bolder; font-size:x-large; color:yellow;
background-color:maroon; text-decoration:none;">
: [1] [2] [3]</h1></td>
</tr>
</table></div></td>
<td rowspan="2"><p align="center"><textarea name="silabaire"
rows="25"
cols="35" wrap="virtual"></textarea></td>
</tr>
<tr>
<td width="33%"></form>
<h2><span style="background-color:maroon;"><font
color="white">&nbsp;b,c,d,f,g,h,j,k,l,m,n,p,q,r,s,t,v,x</font></span></
h2></td>
<td width="33%"><h2><span style="background-
color:maroon;"><font color="white">a,e,i,o,u</font></span></h2></td>
<td width="33%"><h2><span style="background-
color:maroon;"><font color="white">y,w</font></span></h2></td>
</tr>
</table></div>
<p>&nbsp;</p>
</body>

</html>
***************************
It's all!
You can see the page here : http://danielmacouin.chez-alice.fr/silabus.htm

Rick Harrison

unread,
May 18, 2008, 2:10:47 PM5/18/08
to

Thank you, Lenadi. It is fun and easy to use the webpage you made.
0 new messages