Phonetic Name Generator

Josh Sako

unread,

Sep 24, 2008, 6:19:40 AM9/24/08

to

Hi there, folks!

I started writing a Rogue Like about two weeks ago and took a break
from getting the game object system perfected to muck about with
random name generation. Below is my first stab at it, a probabilistic
approach using English phonetics.

Header File:

class NameGenerator {

enum PHENOM_TYPE {
CONSONANT = 1,
VOWEL = 2,
DIPHTHONG = 3
};

typedef std::map<std::string, uint> phenom_list;

private:

// A list of phenoms, mapped to how likely they are
// The chance of each appearing is independent
phenom_list consonant_list;
phenom_list vowel_list;
phenom_list diphthong_list;

// a list of phenoms, mapped to their English character translations
std::multimap<std::string, std::string> phenom_trans;

protected:

// rolls _die d _sides, adds up the result, then returns the number
uint roll(uint _die, uint _sides);

// Chooses a phenom from the passed in list
std::string choosePhenom(phenom_list &_list);

// Choose a consonant
std::string chooseConsonant(void) {

return choosePhenom(consonant_list);
}

// Choose a vowel
std::string chooseVowel(void) {

return choosePhenom(vowel_list);
}

// Choose a diphthong
std::string chooseDip(void) {

return choosePhenom(diphthong_list);
}

public:
// constructor
NameGenerator();

// Generates a random string of phenoms up to _max_length long,
being a minimum of _min_length, and having an early exit chance of
_exit_chance each time through the generation routine
std::string generatePhenom(uint _max_length, uint _min_length, uint
_exit_chance);

// Translate a string of phenoms into English characters
std::string translatePhenom(std::string _phenom);
};

CPP File:

// constructor
NameGenerator::NameGenerator() {

// Set up consonant tables
consonant_list["p"] = 10; // p as in pen, strip, tip
consonant_list["b"] = 10; // b as in but, web
consonant_list["t"] = 10; // t as in two, sting, bet
consonant_list["tS"] = 10; // 'cha' as in CHair, naTure, teaCH
consonant_list["T"] = 10; // t as in THing, breaTH
consonant_list["d"] = 10; // d as in do, odd
consonant_list["dZ"] = 10; // 'g' as in Gin Joy eDGE
consonant_list["k"] = 10; // k as in Cat, Kill, sKin, QUeen thiCK
consonant_list["g"] = 10; // g as in go, get, beg
consonant_list["f"] = 10; // f as in fool, enough, leaf
consonant_list["v"] = 10; // v as in voice, have, of
consonant_list["D"] = 10; // t as in this, breaTHE
consonant_list["s"] = 10; // s as in See, City, paSS
consonant_list["z"] = 10; // z as in zoo, rose
consonant_list["S"] = 10; // s as in SHe, Sure, emoTIon, leaSH
consonant_list["Z"] = 10; // z as in pleaSUre, beiGE
consonant_list["h"] = 10; // h as in ham
consonant_list["m"] = 10; // m as in man, ham
consonant_list["n"] = 10; // n as in No, tiN
consonant_list["N"] = 10; // ng as in siNGer, riNG
consonant_list["l"] = 10; // l as in left, bell
consonant_list["r"] = 10; // r as in Run, veRy
consonant_list["w"] = 5; // w as in We
consonant_list["j"] = 10; // 'ja' as in Yes
consonant_list["W"] = 10; // w as in WHat
consonant_list["x"] = 10; // 'och' as in loCH

// Translation table for consonants

phenom_trans.insert(std::pair<std::string,std::string>("p","p"));
phenom_trans.insert(std::pair<std::string,std::string>("b","b"));
phenom_trans.insert(std::pair<std::string,std::string>("t","t"));
phenom_trans.insert(std::pair<std::string,std::string>("T","th"));
phenom_trans.insert(std::pair<std::string,std::string>("tS","t"));
phenom_trans.insert(std::pair<std::string,std::string>("tS","ch"));
phenom_trans.insert(std::pair<std::string,std::string>("d","d"));
phenom_trans.insert(std::pair<std::string,std::string>("d","dd"));
phenom_trans.insert(std::pair<std::string,std::string>("dZ","g"));
phenom_trans.insert(std::pair<std::string,std::string>("dZ","j"));
phenom_trans.insert(std::pair<std::string,std::string>("dZ","dge"));
phenom_trans.insert(std::pair<std::string,std::string>("k","c"));
phenom_trans.insert(std::pair<std::string,std::string>("k","k"));
phenom_trans.insert(std::pair<std::string,std::string>("k","qu"));
phenom_trans.insert(std::pair<std::string,std::string>("k","ck"));
phenom_trans.insert(std::pair<std::string,std::string>("g","g"));
phenom_trans.insert(std::pair<std::string,std::string>("f","f"));
phenom_trans.insert(std::pair<std::string,std::string>("f","ough"));
phenom_trans.insert(std::pair<std::string,std::string>("v","v"));
phenom_trans.insert(std::pair<std::string,std::string>("v","f"));
phenom_trans.insert(std::pair<std::string,std::string>("th","th"));
phenom_trans.insert(std::pair<std::string,std::string>("D","th"));
phenom_trans.insert(std::pair<std::string,std::string>("D","the"));
phenom_trans.insert(std::pair<std::string,std::string>("s","s"));
phenom_trans.insert(std::pair<std::string,std::string>("s","c"));
phenom_trans.insert(std::pair<std::string,std::string>("s","ss"));
phenom_trans.insert(std::pair<std::string,std::string>("z","z"));
phenom_trans.insert(std::pair<std::string,std::string>("z","x"));
phenom_trans.insert(std::pair<std::string,std::string>("z","se"));
phenom_trans.insert(std::pair<std::string,std::string>("S","s"));
phenom_trans.insert(std::pair<std::string,std::string>("S","sh"));
phenom_trans.insert(std::pair<std::string,std::string>("S","ti"));
phenom_trans.insert(std::pair<std::string,std::string>("Z","su"));
phenom_trans.insert(std::pair<std::string,std::string>("Z","ge"));
phenom_trans.insert(std::pair<std::string,std::string>("h","h"));
phenom_trans.insert(std::pair<std::string,std::string>("m","m"));
phenom_trans.insert(std::pair<std::string,std::string>("n","n"));
phenom_trans.insert(std::pair<std::string,std::string>("N","ng"));
phenom_trans.insert(std::pair<std::string,std::string>("l","l"));
phenom_trans.insert(std::pair<std::string,std::string>("l","le"));
phenom_trans.insert(std::pair<std::string,std::string>("l","ll"));
phenom_trans.insert(std::pair<std::string,std::string>("r","r"));
phenom_trans.insert(std::pair<std::string,std::string>("w","w"));
phenom_trans.insert(std::pair<std::string,std::string>("j","y"));
phenom_trans.insert(std::pair<std::string,std::string>("W","wh"));
phenom_trans.insert(std::pair<std::string,std::string>("x","ch"));

// Vowel list
vowel_list["A"] = 10; // 'a' as in father
vowel_list["i"] = 30; // 'e' as in sEE
vowel_list["I"] = 10; // 'i' as in cIty
vowel_list["E"] = 30; // 'e' as in bEd
vowel_list["3`"] = 10; // 'ir' as in bIRd
vowel_list["{"] = 10; // 'a' as in lAd, cAt, rAn
vowel_list["Ar"] = 10; // 'ar' as in ARm
vowel_list["V"] = 10; // 'u' as in rUn, enOUgh
vowel_list["0"] = 10; // 'a' as in nOt, wAsp
vowel_list["O"] = 10; // 'o' as in lAW, cAUght
vowel_list["U"] = 10; // 'u' as in pUt
vowel_list["u"] = 30; // 'u' as in sOOn, thrOUgh
vowel_list["@"] = 10; // 'a' as in about
vowel_list["@`"] = 10; // 'er' as in winnER

// Translation table for vowels
phenom_trans.insert(std::pair<std::string,std::string>("A","a"));
phenom_trans.insert(std::pair<std::string,std::string>("i","e"));
phenom_trans.insert(std::pair<std::string,std::string>("i","ee"));
phenom_trans.insert(std::pair<std::string,std::string>("I","ei"));
phenom_trans.insert(std::pair<std::string,std::string>("I","i"));
phenom_trans.insert(std::pair<std::string,std::string>("E","e"));
phenom_trans.insert(std::pair<std::string,std::string>("3`","ir"));
phenom_trans.insert(std::pair<std::string,std::string>("{","a"));
phenom_trans.insert(std::pair<std::string,std::string>("Ar","ar"));
phenom_trans.insert(std::pair<std::string,std::string>("V","u"));
phenom_trans.insert(std::pair<std::string,std::string>("V","ou"));
phenom_trans.insert(std::pair<std::string,std::string>("0","o"));
phenom_trans.insert(std::pair<std::string,std::string>("O","au"));
phenom_trans.insert(std::pair<std::string,std::string>("O","aw"));
phenom_trans.insert(std::pair<std::string,std::string>("U","u"));
phenom_trans.insert(std::pair<std::string,std::string>("u","oo"));
phenom_trans.insert(std::pair<std::string,std::string>("u","ou"));
phenom_trans.insert(std::pair<std::string,std::string>("@","a"));
phenom_trans.insert(std::pair<std::string,std::string>("@`","er"));

// Dipthongs
diphthong_list["e"] = 10; // 'ay' as in dAY
diphthong_list["aI"] = 10; // 'iy' as in mY
diphthong_list["OI"] = 10; // 'oy' as in bOY
diphthong_list["o"] = 10; // 'oh' as in nO
diphthong_list["aU"] = 10; // 'ow' as in nOW
diphthong_list["ir"] = 10; // 'ere' as in nEAR, hERE
diphthong_list["er"] = 30; // 'air' as in thERE, hAIR
diphthong_list["Ur"] = 30; // 'our' as in tOUR
diphthong_list["ju"] = 10; // 'oou' as in pUpil

// Diphthong translations
phenom_trans.insert(std::pair<std::string,std::string>("e","ay"));
phenom_trans.insert(std::pair<std::string,std::string>("aI","y"));
phenom_trans.insert(std::pair<std::string,std::string>("aI","ai"));
phenom_trans.insert(std::pair<std::string,std::string>("OI","oy"));
phenom_trans.insert(std::pair<std::string,std::string>("o","o"));
phenom_trans.insert(std::pair<std::string,std::string>("aU","ow"));
phenom_trans.insert(std::pair<std::string,std::string>("ir","ear"));
phenom_trans.insert(std::pair<std::string,std::string>("ir","ere"));
phenom_trans.insert(std::pair<std::string,std::string>("er","ear"));
phenom_trans.insert(std::pair<std::string,std::string>("er","air"));
phenom_trans.insert(std::pair<std::string,std::string>("Ur","our"));
phenom_trans.insert(std::pair<std::string,std::string>("ju","u"));
}

// rolls _die d _sides, adds up the result, then returns the number
uint NameGenerator::roll(uint _die, uint _sides) {

uint result = 0;

// For each die to roll,
for (int i = 0; i < _die; ++i) {

// Roll the die and add it to the result
result = result + (rand() % _sides + 1);
}

// return the result
return result;
}

// Chooses a phenom from the passed in list
std::string NameGenerator::choosePhenom(phenom_list &_list) {

std::string phen = "";

// We try until a winner is found
while (phen == "") {

// Choose a random spot in the phenom array
uint pos = roll(1, _list.size()) - 1;

// Roll to see if we add this
uint chance = roll(1, 100);

phenom_list::iterator p = _list.begin();

// Move the iterator
for (int i = 0; i < pos; ++i) {
++p;
}

if ( (*p).second <= chance) {

// Add it to the string
phen = (*p).first;

// Break out of the loop
break;
}

}

return phen;
}

// Generates a random string of phenoms up to _max_length long, being
a minimum of _min_length, and having an early exit chance of
_exit_chance each time through the generation routine
std::string NameGenerator::generatePhenom(uint _max_length, uint
_min_length, uint _exit_chance) {

std::string phenom;

// Choose a random state to start as
uint state = roll(1, 3);
uint chance = 0;
std::string phenom_type;

// For every possible phenom
for (int len = 0; len < _max_length; ++len) {

// Switch on what type of sound we want
switch (state) {

case CONSONANT:

phenom_type = chooseConsonant();

// Append it to the string with the seperator token
phenom = phenom + "|";
phenom = phenom + phenom_type;

// Change the state
chance = roll(1, 100);

// vowels mostly come after consonantes
if (chance <= 60) {

state = VOWEL;
}
// Another consonant
else if (chance <= 95) {
state = CONSONANT;
}
// A diphthong
else {
state = DIPHTHONG;
}

break;

case VOWEL:

phenom_type = chooseVowel();

// Append it to the string
phenom = phenom + "|";
phenom = phenom + phenom_type;

// Change the state
chance = roll(1, 100);
// vowels pairs are unusual
if (chance <= 10) {

state = VOWEL;
}
// Consonants arent
else if (chance <= 95) {
state = CONSONANT;
}
// A diphthong
else {
state = DIPHTHONG;
}

break;

case DIPHTHONG:

phenom_type = chooseDip();

// Append it to the string
phenom = phenom + "|";
phenom = phenom + phenom_type;

// Change the state
chance = roll(1, 100);
// Only consonates and vowels allowed after
// diphthongs
if (chance <= 10) {

state = VOWEL;
}
else {
state = CONSONANT;
}

break;

}

// Check to see if we break early
if (len >= _min_length - 1) {
if (roll(1, 100) < _exit_chance) {
break;
}
}

}

return phenom;
}

// Translate a string of phenoms into English characters
std::string NameGenerator::translatePhenom(std::string _phenom) {

std::string translated; // The translated string
std::string token; // The current token

std::multimap<std::string, std::string>::iterator p_find;
std::multimap<std::string, std::string>::iterator p_last;

// skip delimiters at beginning.
std::string::size_type lastPos = _phenom.find_first_not_of("|",
0);

// find first "non-delimiter".
std::string::size_type pos = _phenom.find_first_of("|", lastPos);

// For every token in the string,
while (std::string::npos != pos || std::string::npos != lastPos)
{

// Find a token
token = _phenom.substr(lastPos, pos - lastPos);

// Lookup the translation(s)
p_find = phenom_trans.find(token);
//p_last = phenom_trans.upper_bound(token);

// If a token was found,
if (p_find != phenom_trans.end()) {

// Get one of the possible translations randomly
uint trans_num = roll(1, phenom_trans.count(token)) - 1;

for (int i = 0; i < trans_num; i++) {
++p_find;
}

translated = translated + (*p_find).second;

}
else {
translated = translated + "'";
}

// Skip delimiters. Note the "not_of"
lastPos = _phenom.find_first_not_of("|", pos);

// Find next "non-delimiter"
pos = _phenom.find_first_of("|", lastPos);

}

return translated;
}

It produces some pretty interesting output even with all those hard-
coded magic numbers (results were with settings of generatePhenom(10,
3, 60); you can get wildly different generations just by changing
those three settings):

|O|T|A|k
authac

|aU|dZ|A
owga

|V|f|I|t|@`
oufiter

|OI|Z|j
oygey

|3`|dZ|E
irje

|e|r|W
ayrwh

|o|n|w|V
onwou

|O|l|w
aulew

|OI|t|@
oyta

|E|f|3`
efir

|s|w|I|tS
ssweich

|p|m|I
pmei

|Ur|j|d
ourydd

|Ur|r|OI
ourroy

|l|j|x|g
lychg

|s|W|b
swhb

|OI|s|0|u|w|u|p|I
oysoouwoopei

|x|@`|s
cherss

|aI|z|@
yxa

|@|f|A
afa

Not bad, but not perfect. My next step is to move the rules and
translation tables over to a separate object. Something like a
Phonetic Dictionary. To change the style of the words (or the rules by
which they are generated), you could then just set a different
dictionary. Markov chains would probably be a better, easier way to
go, but... what's life without a bit of over engineering, eh?

Jakub Debski

unread,

Sep 24, 2008, 7:21:55 AM9/24/08

to

After serious thinking Josh Sako wrote :
> CPP File:

> phenom_trans.insert(std::pair<std::string,std::string>("p","p"));

In CPP you can use "using namespace std".
Also typedef is your friend :)

> dictionary. Markov chains would probably be a better, easier way to
> go, but... what's life without a bit of over engineering, eh?

it's a math ;)

regards,
Jakub

Radomir 'The Sheep' Dopieralski

unread,

Sep 24, 2008, 10:33:05 AM9/24/08

to

At Wed, 24 Sep 2008 03:19:40 -0700 (PDT),
Josh Sako wrote:

> Hi there, folks!
>
> I started writing a Rogue Like about two weeks ago and took a break
> from getting the game object system perfected to muck about with
> random name generation. Below is my first stab at it, a probabilistic
> approach using English phonetics.

It's a great idea and I really appreciate you sharing your code,
but could you maybe upload it somewhere next time (I'm sure there
are lots of free hosting options for sourcecode of opensource apps
and libs) and just link here?

Copy-pasting source code from a news client is not really a nice
way of trying it out, also reading source code in newsread is rather
inconvenient (no syntax highlighting and block-based motion).

Sorry for complaining.
--
Radomir `The Sheep' Dopieralski <http://sheep.art.pl>
"Whenever you find yourself on the side of the majority,
it's time to pause and reflect." -- Mark Twain

Maxy-B

unread,

Sep 24, 2008, 10:33:51 AM9/24/08

to

On Sep 24, 5:19 am, Josh Sako <jgi...@gmail.com> wrote:
> Hi there, folks!
>
> I started writing a Rogue Like about two weeks ago and took a break
> from getting the game object system perfected to muck about with
> random name generation. Below is my first stab at it, a probabilistic
> approach using English phonetics.

Be sure also to check out the source code of Stone Soup. I believe
they generate the random names of shopkeepers (and other stuff) with
something similar.

Ah, I'll never forget the game where I discovered "Fag Cum's
Jewellery
Shop".

--
Max

Mario Donick

unread,

Sep 24, 2008, 1:30:51 PM9/24/08

to

On Sep 24, 12:19 pm, Josh Sako <jgi...@gmail.com> wrote:
> Hi there, folks!
>
> I started writing a Rogue Like about two weeks ago and took a break
> from getting the game object system perfected to muck about with
> random name generation. Below is my first stab at it, a probabilistic
> approach using English phonetics.
>
> Header File:
>
> class NameGenerator {

[snip]

Very long post. Too long with all that code. ;)

Besides, I like all approaches that try to use some clean linguistic
concepts for doing language random stuff instead of just meshing
together half-knowledge based on the own experience. ;)

However, although you have a clear distinction between different
categories, this is not always visible in the results. I think not all
of your examples are pronouncable by english speakers. Depending on
the style of your game this might not be too bad -- but it contradicts
your English-based phonetic approach.

Mario Donick

Josh Sako

unread,

Sep 24, 2008, 8:31:10 PM9/24/08

to

>In CPP you can use "using namespace std".

Doesn't that defeat the purpose of having namespaces in the first
place?

>Also typedef is your friend :)

Yeah, I used it on one of the definitions, but not the other.

>It's a great idea and I really appreciate you sharing your code,
>but could you maybe upload it somewhere next time (I'm sure there
>are lots of free hosting options for sourcecode of opensource apps
>and libs) and just link here?

Ahh, sure. I wasn't aware of the rules for that sort of thing here.
Mea culpa. I'll do so in the future.

>Be sure also to check out the source code of Stone Soup. I believe
>they generate the random names of shopkeepers (and other stuff) with
>something similar.

Interesting! I'll have a look, thanks!

>
> Very long post. Too long with all that code. ;)

Again, I apologize.

>
> Besides, I like all approaches that try to use some clean linguistic
> concepts for doing language random stuff instead of just meshing
> together half-knowledge based on the own experience. ;)

Well, I'm not a linguist at all, so this really is a ham-fisted half-
knowledge attempt. I just did a bit of research and cobbled together
some concepts in a quick-and-dirty fashion.

> However, although you have a clear distinction between different
> categories, this is not always visible in the results. I think not all
> of your examples are pronouncable by english speakers. Depending on
> the style of your game this might not be too bad -- but it contradicts
> your English-based phonetic approach.

Yes, this is a problem. The reason seems to be that, while I implement
independent probabilities for every phoneme, that's all the rules
there are. There are some special-cases which need specific rules to
produce better results; I added two, in fact. The first looks at the
phoneme string and sees if a "weird" character pairing is at the end
(like 'w' or 'wh') and then it just appends a new vowel (this could
conceivably make the string one longer than its _max_length, but the
probability of that happening seems remote enough that it can be
effectively ignored, at least until I refactor the code). Another rule
counts the number of consecutive consonants then uses an increasing
probability curve to make it more likely that a vowel will be chosen
next. The more consecutive consonants, the greater the chance that a
new vowel will be added. This doesn't quite eliminate cases of no
vowels, but it does seem to improve things.

I also dramatically improved the random number routine so that the
period is much higher. This has given some very nice improvement in
the diversity of strings produced. With these improvements, here are
some examples:

|aU|w|@`|f|er|D|@|z
owweroughairthaz

|e|w|{|U
aywau

|z|b|I|h
sebih

|U|v|u
ufou

|O|d|@`
audder

|j|o|f|@`
yofer

|z|T|u|g
xthoug

|OI|d|V|i|z|p|U|r
oydouexpur

|V|m|i
umee

|p|A|N|n|O|Z|b|u
pangnawsuboo

|W|i|x
whech

|d|E|W|V
dewhu

|j|Ar|n|dZ
yarng

|n|A|m|j|e|tS
namyaych

|S|U|N
tiung

|O|f|W|I
auoughwhei

This seems much better overall.

Mario Donick

unread,

Sep 25, 2008, 1:18:40 AM9/25/08

to

> Yes, this is a problem. The reason seems to be that, while I implement
> independent probabilities for every phoneme, that's all the rules
> there are. There are some special-cases which need specific rules to
> produce better results; I added two, in fact. The first looks at the
> phoneme string and sees if a "weird" character pairing is at the end
> (like 'w' or 'wh')

This is a good step in the right direction. How do you determine which
pairs are "weird"?

[There exists some linguistic knowledge about typical and non-typical
phoneme combinations in several languages. I don't know about them for
English in particular (I am German (and German linguist)), but I am
sure that you can find information on this topic in the Internet or in
books.]

Mario Donick

Josh Sako

unread,

Sep 25, 2008, 2:01:06 AM9/25/08

to

> This is a good step in the right direction. How do you determine which
> pairs are "weird"?

At the moment, I just generate a big batch of strings and look for
patterns which are unpronounceable. I haven't discovered any general
algorithm which can describe exceptions and what to do about them.

Thinking it over, it looks like I should add two new lists to the
dictionary, one for use in generating phonemes and one for use when
translation. These lists would contain exception rules and the
dictionary would just run through them each time through the loop one
by one to make sure all the generations and translations make sense. I
need them during both generation and translation because there are
times when exceptions occur only in one and not the other. For
example, the hard 'k' sound is perfectly acceptable at being at the
end of words, but the specific rendering using 'qu' should not be used
then and one of the other possible renderings should. One thing to
consider is what to do when there are no acceptable alternatives; I
suppose in this case, either dropping the phoneme or translation, or
appending it with an accent mark of some kind (like Ch'thoth) would
do.

In order to do this, the generation and translation routines look like
they need the current generation/translation string as well as the
future state. With information about the past, present, and immediate
future, it should be possible to write a set of rules to make sure
that strings always end up pronounceable.

I also might drop the probabilities for individual phonemes. They're
mostly the same and don't seem to affect the outcome enough for the
extra complexity. Maybe if they had a higher standard deviation in
probabilities...

> [There exists some linguistic knowledge about typical and non-typical
> phoneme combinations in several languages. I don't know about them for
> English in particular (I am German (and German linguist)), but I am
> sure that you can find information on this topic in the Internet or in
> books.]

Are there any websites or search terms you recommend as a starting
point?

Mario Donick

unread,

Sep 25, 2008, 3:15:51 AM9/25/08

to

For a start, Wikipedia is sufficient, especially the following parts:

http://en.wikipedia.org/wiki/English_phonology#Syllable-level_rules
http://en.wikipedia.org/wiki/English_phonology#Word-level_rules

There are enough information to create rules for a name generator.

For more detailed information, I suggest the following 2 papers:

L. Bauer: "The phonotactics of some English morphology"
- URL: www.victoria.ac.nz/lals/staff/laurie-bauer/Bauer-Phonotactics.pdf

A. Weber: "The role of Phonotactics in the Segmentation of Native and
Non-Native Continuous Speech"
- URL: www.coli.uni-saarland.de/~aweber/swap2000text.pdf

Mario Donick

Ray Dillinger

unread,

Sep 25, 2008, 11:28:40 PM9/25/08

to

Mario Donick wrote:

> This is a good step in the right direction. How do you determine which
> pairs are "weird"?
>
> [There exists some linguistic knowledge about typical and non-typical
> phoneme combinations in several languages. I don't know about them for
> English in particular (I am German (and German linguist)), but I am
> sure that you can find information on this topic in the Internet or in
> books.]

English is very strange because it borrows words (and spelling conventions)
from many other languages. Generally, however, there's a three-phoneme
test that works well for identifying words whose pronunciation will seem
consistent.

You can find phonetic combination frequency tables for English - I
recommend Freidman's and Norvig's, although nothing prevents you from
compiling your own. These give relative frequencies for the most
common three-phoneme sequences found in English.

A first approximation gives one phoneme per letter, but there are a lot
of special cases where it isn't true. English uses a lot more phonemes
than there are letters in its alphabet, and for each phoneme it seems
there are at least three or four different ways to write it. Doubled
P's, L's T's, O's and S's for example represent single phonemes, as do
digraphs like ch, ph, gh, th, and so on. English also treats a number
of combined vowels like ie, ea, and so on as single phonemes. Trailing
E is usually not a phoneme itself, but instead denotes a modified
vowel sound in the last syllable. 's' codes for a different phoneme at
the beginning of a word or following a consonant than when found
following a vowel. And so on. There's a big list of rules in an
appendix -- I think it was in Norvig's book, but most of my computer
linguistics books are in boxes right now so I'm not looking it up.
Anyway, if you code those rules, you have a much closer approximation
to the phoneme sequence.

Now you take your phonetic combination frequency table and use it like
a "sliding window" to examine each subsequence of three phonemes. Each
word is assigned a score which is the product of its relative frequencies
divided by the logarithm of its length. Words that score higher are
more likely to be seem to 'belong' in English, or be 'familiar' to
speakers of English. Words that are unpronounceable will generally
score zero (contain at least one three-phoneme sequence whose frequency
is zero).

The same test works fine in other languages, but identifying phonemes
unambiguously using software is usually easier and the base of the
logarithm needs adjustment. Use the same base as for the logarithm
in Zipf's frequency rule for spelling for your language, adjusted
for the average number of letters per phoneme.

Alternatively -- almost as accurate and with far less linguistic
analysis -- you can use alphabetic rather than phonetic frequency
tables but you need four-letter sequences rather than three-phoneme
sequences. The disadvantage here is that the tables have to be utterly
huge or else they'll throw out a lot of words that are quite
pronounceable, and because they'd fill a whole book and they're not
very interesting, I don't know of anybody who's published a good
set. Compiling your own tables in this case would require simple
software, but you'd need access to a quite large corpus of text.

Bear

Gelatinous Mutant Coconut

unread,

Sep 25, 2008, 11:59:14 PM9/25/08

to

It has already been mentioned, but real-sounding English names are
especially difficult to generate from a single set of rules because
names are drawn from so many different languages.

What I would advise would be to create several generators, each of
which draws from a much smaller set of phonemes. You could then have
one that generates Germanic sounding names, one that generates Spanish
sounding names, one that generates British sounding names, and so on.
Your random name generator would choose to get a name from one of
these generators, either at random or based on a caller-specified
probabilities. Not only should you be able to avoid unpronounceable
names more easily from more constrained sets of phonemes and stricter
rules per name origin, but you could then vary the frequency of the
language of origin of the names in different locations of your world,
which might help give different towns and countries some local flair.

(And you can, of course, make generators not based on real world
language names; a goblin name generator, for instance.)

Soyweiser

unread,

Sep 30, 2008, 8:34:38 AM9/30/08

to

On Sep 24, 4:33 pm, Radomir 'The Sheep' Dopieralski

I suggest to use http://pastebin.com/ next time. It seems to be a type
of code sharing tool. It is a bit public, not everybody will like
that. But it is better than nothing.

--
Soyweiser