I wanted a random name generator for the player, so decided to write a
new one. It uses a very simple finite state machine to select between
individual letters in a fashion that creates pronouncable player
names. I look forward to people's comments/improvements.
The comments at the top explain the nature of the beast. The only
syllable based component is where I trigger the termination timer
using a quickly accelerating random chance to keep names short.
You can treat this code as public domain.
rand_name(char *text, int len)
// Very simple markov generator.
// We repeat letters to make them more likely.
const char *vowels = "aaaeeeiiiooouuyy'";
const char *frictive = "rsfhvnmz";
const char *plosive = "tpdgkbc";
const char *weird = "qwjx";
// State transitions..
// v -> f, p, w, v'
// v' -> f, p, w
// f -> p', v
// p -> v, f'
// w, p', f' -> v
int syllables = 0;
int pos = 0;
bool prime = false;
// Initial state choice
state = 'v';
else if (rand_chance(40))
state = 'f';
else if (rand_chance(70))
state = 'p';
state = 'w';
while (pos < len-1)
// Apply current state
text[pos++] = vowels[rand_choice(strlen(vowels))];
text[pos++] = frictive[rand_choice(strlen(frictive))];
text[pos++] = plosive[rand_choice(strlen(plosive))];
text[pos++] = weird[rand_choice(strlen(weird))];
// Chance to stop..
if (syllables && pos >= 3)
if (!prime && rand_chance(10))
state = 'v';
prime = true;
else if (rand_chance(40))
state = 'f';
else if (rand_chance(70))
state = 'p';
state = 'w';
prime = false;
if (!prime && rand_chance(50))
prime = true;
state = 'p';
state = 'v';
prime = false;
if (!prime && rand_chance(10))
prime = true;
state = 'f';
state = 'v';
prime = false;
state = 'v';
prime = false;
text = toupper(text);
text[pos++] = '\0';
>A common thing that shows up here is random name generation. Most of
>POWDER uses a syllable based approach, which I like as you can control
>the feel of the words, but it is a lot of work to get a large
That's pretty cool. I don't really do fantasy settings, but it's
still neat. I wonder if the same idea can be used to randomize
specific kinds of names, like demons would have,
http://en.wikipedia.org/wiki/List_of_demons, or Lovecraftian types of
names. . They have a different feel to them. I only bring it up
thinking about a horror genre RL.
Of course, after I wrote this, I found
http://www.seventhsanctum.com/index-name.php, which has a Lovecraftian
name generator (along with a bunch of others), but doesn't say how
they are generated. I've really never given it thought. I guess I
imagined just having some crazy lookup table...
> That's pretty cool. I don't really do fantasy settings, but it's
> still neat. I wonder if the same idea can be used to randomize
> specific kinds of names, like demons would have,http://en.wikipedia.org/wiki/List_of_demons, or Lovecraftian types of
> names. . They have a different feel to them. I only bring it up
> thinking about a horror genre RL.
There's a little program called NameMage at http://www.mapmage.com/namemage.htm
What it does is to randomly pick syllables from three different lists:
beginning, middle and final. That way, the names are always
pronouncable, but, what's more important, they "make sense" and have a
certain feeling (no Lovecraftian names, but there are Eddings styled
names, Tolkien styled names... and you can add anything you want,
since it uses simple text files as database).
What about book/scroll titles? Often they are preprogrammed, but I
prefer to randomize them (found the feature when browsing Moria source
code and loved it). Now, the fun part is not just to generate a name
from a fixed list of syllables, but rather selecting the syllables so
that the result is always more or less pronouncable. It's easy to let
syllables "a","aa","ar","ba" be used, but what if "baaaaar" is
generated? Looks awful. BTW, the Lovecraftian name generator made
"Holl-deggggnt" among others. Four g's! And no Abdul Al-Hazred, I'm
afraid... :) Picking syllables isn't that easy.
There's more to it. What if you need the scroll titles in various
languages? You need either certain rules of joining syllables for each
language or different sets of syllables (the latter is my case). Try
writing something like that, it's really fun to play with!
That's pretty neat. Thanks for sharing, Jeff.
POWDER's random names have always been at most this pronounceable.
It doesn't bother me, as really foreign languages don't transliterate
well into the latin alphabet. (Example I had to implement web pages
in (illiterately): Gaelic.) Even Old and Middle English don't
transliterate well into the latin alphabet.
> Jeff, you'd need a set of rules to generate really pronouncable names.
Well, I think all the example names Jeff listed can be pronounced on way
You have just to keep in mind that the native language (of the player
who has to use these names) provides not necessarily the "correct" way
of pronouncing something. This, of course, implies that the
categorization in Jeff's source code (vowels, fricatives, plosives,
"weird") had to be optimized a bit to be more general.
The problem here is that the source already uses letters, while it
should start with phonemes and transform the phonemes to graphemes
For that, we'd need a way of representing phonemes in source code, i.e.
we'd need to enter and process phonetic spelling in the source. Today,
the IPA (International Phonetic Alphabet) with many many special
characters is used to write down phonemes.
This was problematic in the early days of computing and internet, so
alternate systems were developed. These could be applied easily to
source code in a roguelike, too, for example the X-SAMPA-system (see
http://en.wikipedia.org/wiki/X-SAMPA for details).
STEP 1: DEFINE STRINGS FOR PHONEMES
So, in a first step we would need to define strings for every phoneme we
want to use. For most phonemes used in the English language, X-SAMPA
offers simple one-character-representations, so the "th" like in "then"
or "this" is written as "D" (actually, single phonemes are embraced by
brackets, so it would be <D>).
STEP 2: COMBINE PHONEMES, USING CERTAIN RULES
Now, when we have our phoneme strings, we can combine them by certain
rules (this is the second step). These rules can be derived from
For example, in German there is no english "th" (resp. <D> in X-SAMPA)
and therefore should not be used when creating names that should be
spoken easily by native german speakers (on the other hand, using
foreign phonemes adds spice to the game, so actually I would always use
some alien phonemes).
Another example (again from standard German) would be the phoneme <x>
(like in "Loch", where "ch" is the phoneme <x>) -- this never occurs at
the beginning of a standard German word, but may occur in the center or
at the end.
Rules like these would have to be regarded by the name generator
properly when combining phonemes.
STEP 3: TRANSFORM PHONEMES INTO GRAPHEMES
In the third step, we have to transform the phonemes into graphemes,
i.e. the phonemes have to be replaced by actual letters used in a
For example, the <x>-phoneme would have to be replaced by the letters
"ch" in standard German. The <D>-phoneme would have to be replaced by
the letters "th" in English.
If we use phonemes that do not occur in a certain language, we could
also transcribe them, so "th" or <D> could transcribed to "v" or "w" --
in general a letter that represents a phoneme that is as similar as
possible to the foreign phoneme. (But as I said before, I would leave
the foreign phoneme as it is, to keep the spice).
CREATING A NAME
When we now want to create a name, we would have to call the function in
a way like (pseudocode is Pascal style):
MyName := CreateName(CreatingMode: String; OutputMode: String);
where the parameters are strings determing which rules have to be
applied, for example
MyName := CreateName('EN DE', 'DE');
would create names with english and german phonemes and rules, but
output them in a way german speakers could read and speak easily.
MyName := CreateName('-', '-');
would use all available phonemes and rules and output them in their
standard transcription (so "th" resp. <D> would always stay "th").
WHY TO USE THIS?
Well, to satisfy a linguist's needs ;) ... No, there are at least two
1. different regions in a game (and the NPCs who live there) can have
different styles of names. If we had a game that resembled northern
Europe, we could generate phonetically adequate names -- without the
need for re-defining the name generator. If we had an african region or
an asian region, we could use the same generator to produce adequate
names -- because we use a phonetic alphabet like X-SAMPA.
2. if a game is run by a german player, the names generated could keep
that in mind -- and mainly generate names he/she can pronounce. The same
for all other languages, of course.
So, regardless if this concept is used by anybody (or even implemented),
I had fun while thinking about it ;)
Aaah, I am so stupid. Phonemes are markes with slashes, so the D I spoke
of would be /D/ . (The brackets < and > are instead for graphemes).
> For that, we'd need a way of representing phonemes in source code, i.e.
> we'd need to enter and process phonetic spelling in the source. Today,
> the IPA (International Phonetic Alphabet) with many many special
> characters is used to write down phonemes.
> This was problematic in the early days of computing and internet, so
> alternate systems were developed. These could be applied easily to
> source code in a roguelike, too, for example the X-SAMPA-system (seehttp://en.wikipedia.org/wiki/X-SAMPAfor details).
> STEP 1: DEFINE STRINGS FOR PHONEMES
> So, in a first step we would need to define strings for every phoneme we
> want to use. For most phonemes used in the English language, X-SAMPA
> offers simple one-character-representations, so the "th" like in "then"
> or "this" is written as "D" (actually, single phonemes are embraced by
> brackets, so it would be <D>).
STEP 1a: DECIDE WHICH PHONEMES SOUND THE SAME TO A NATIVE SPEAKER. ;)
That way you can have passphrases that the @ can have a hard time
getting right (e.g., lollapalooza for Japanese-native @).
> STEP 1a: DECIDE WHICH PHONEMES SOUND THE SAME TO A NATIVE SPEAKER. ;)
> That way you can have passphrases that the @ can have a hard time
> getting right (e.g., lollapalooza for Japanese-native @).
Well, the way to do that is to decide what phonemes are actually used
in languages the speaker knows. Japanese has only one voiced alveolar
approximant consonant; since 'l' and 'r' are both voiced alveolar
approximant consonants, they both map to that same Japanese phoneme.
This doesn't happen at random; the sounds that are indistinguishable
to someone are sounds that are 'alike' for purposes of their own
language, which means they're formed approximately the same way in
Consonants are formed by blocking the airflow from the lungs. Which
consonant, at least in English and European languages, depends on
whether nasalization (passage of air through the nose) is allowed,
where the blockage takes place, and how much the air is blocked.
Here's a table of English consonants: (use a fixed-size font).
From left to right it goes further back in the vocal tract, and
from top to bottom it describes degree of stoppage.
labial l-dental dental alveolar palatal velar glottal
stop p t k
fricative f th s sh h
Nasals are always voiced; otherwise they're just silent breathing.
I have abbreviated 'labio-dental' (formed between lips and teeth).
Dental, alveolar, and palatal consonants are formed with the tongue
against the teeth, a point where the shape of the palate changes shape
about a centimeter behind the front teeth, and a point further back in
the palate than that, respectively.
Some languages differentiate a set of consonants formed further back
than the 'palatal' consonants of English, and linguists also call
these 'palatal.' In that context the English palatal consonants
are called 'alveolar-palatal'.
A stop is a stoppage of all airflow. Since this is silent, the
consonant is formed when air is released. These consonants are
also called plosives. Fricatives impede airflow enough to cause
audible friction. Approximants barely impede airflow; just enough
to be heard. an Affricate starts with a stop, like a plosive. But
instead of a sudden release of air, it transitions into a fricative.
You can see what gives Japanese speakers such a problem in English;
We distinguish two different voiced consonants formed at the same
point of stoppage and with approximately the same degree of stoppage!
both 'l' and 'r' are voiced palatal approximants, but the The 'l'
is more closed than the 'r'; to pronounce it you have to have the
tongue nearer the roof of the mouth. Russian and Gaelic use this
distinction a lot more than English, so much that you pretty much
need an extra category in the 'stoppage degree' series to describe
We have two different 'th' sounds, one voiced and one unvoiced.
Although we think of these as the same, they are not, and this
is a point where English speakers face the same problem in other
languages that Japanese speakers face in English.
Anyway, different languages have different sets of consonants,
some the same consonants that English has and some not. A lot
of the boxes that English leaves 'empty' in the table above are
used in other languages. (for example, the velar fricative, or
guttural 'ch' that actual knights used to pronounce the word
'knight' (middle English 'knicht') is now conspicuously absent
in English, but German still uses it (as in 'Bach' for example)).
And whatever sounds a language doesn't use, tend to get grouped
with whatever 'nearby' sounds that language does use. Almost
always, sounds that get confused will be sounds adjacent to
one another in the above chart.
Also, some languages use distinctions between consonants that
English does not recognize at all. Chinese and Hindi use
Aspiration, whether a stop is released with a 'puff' or just
released, and this gives them additional 'stop' consonants
that English speakers cannot usually distinguish from 'p',
'b', 'k', 'g', 't', and 'd'.
Languages tend not to have more than 20 or 30 consonants;
otherwise they start running together and it begins to be very
hard to pronounce words so that they remain distinct from
other words. In this chart, almost every consonant in English
has 'empty' space on at least two sides. That's why you can
understand English in a high wind, mumbled by a drunk, shouted
from a hundred meters away, or in the middle of radio static.
The restricted choices mean sounds that are lost or distorted
have only a limited number of ways to be resolved; in context
rarely more than two. Lots of different consonants are a
luxury for languages that rarely face interruption, noise,
stressed speakers, shouting, slurring, or distortion.
All this is supplemental information provided by a confessed
hardcore language geek; actually using it in a roguelike game
would please hardcore language geeks the world over, but is
probably a lot more work than gameplay justifies.
(ps. Vowels are much more complicated than consonants, but
also interpreted much more forgivingly by most listeners.)
Your explanations were great, specially concerning the "sound the same"
aspect. I had the same in mind as you, but it had taken a lot of time
for me to explain it in a simple way.
> hardcore language geek; actually using it in a roguelike game
> would please hardcore language geeks the world over, but is
> probably a lot more work than gameplay justifies.
But it would be certainly innovative, and 2 practical uses I stated in
location name generation table (linear):
0000.."Old ".........."Chi"........"ba"..........." City"
0001.."New ".........."Ro"........."man".........." Sur"
1000.."Zig ".........."Ba"........."cal".........." Highway"
It was just a little stab at something that could be greatly expanded
upon if more bits were used.
Don't all three of those use Latin script normally?
Transliteration is not the problem. Latin alphabet *can* give the feel
of different languages if you work out your syllable set a bit. Here
are some examples of what my scroll title generator comes up with:
* Dos vorasco
* Muerir guabioso doivosdosteis
* Vaso deldieco cuanede
* Gwyt'entha beltodhe vaelsei
* S'hea dheygnamil
* Aipakravoe ceaess'
* Akt eangjen jegmotfremig
* Enmydis lostdehjer trerforlen
* Taomskaferd svagrimen
They are not really meant to be pronouncable (even though normally
they are, more or less), but they all sound very different from each
other. Maybe you even identify the languages I took syllables from :).
Logically, I omitted all diacritical marks (the transliteration into
Latin alphabet problem) and they still look quite fine. They all
potentially are foreign languages both to the player and to the PC :).
It's also important to account for your language's standard
orthography. For example, "ej" couldn't be an English word, but "edge"
is. "Dgar" would not be, but "jar" is. "Rayj" isn't, but "rage" is. Even
though the phoneme can appear in either of those places, it must be
Using "j" in syllable-initial positions and "dge" in syllable-final
positions except when you want a long vowel preceding it in which case
you should use "ge" would probably be a workable rule. Of course, if
you're attempting to simulate a made-up language you can ignore special
cases, but maybe you're trying to blend the words in (like Nethack's
"foobie bletch"), which is the case I'm pondering.
It might be easier to start and finish with graphemes based on
probabilities from your target language, disallowing any real words, if
you're trying to make "natural" words for some language. Although if
you're going to disallow real words, might want to ship with a list of
"unused words" rather than generating them on the fly.
Gaelic need not, although I only saw the alternate reconstructed
alphabet in PDFs I did not need to personally edit. Old English
benefits from the letter "thorn" if one is being truly pedantic about
rendering the texts.
All three will be completely mispronounced if you try to sight-read
the latin-alphabet form with any dialect of modern English. The Great
Vowel Shift is much larger than the little deviations (broadly,
Australian/American/Queen's English) since Shakespeare.
So will French or German be, or even Latin itself.
Less badly than Gaelic, though. In fact, unless you actually *know* the
rules of Gaelic orthography, it's close to impossible to pronounce a
passage of Gaelic correctly given its spelling. You won't even get the
consonants right, let alone the vowels.
\_\/_/ turbulence is certainty turbulence is friction between you and me
\ / every time we try to impose order we create chaos
\/ -- Killing Joke, "Mathematics of Chaos"
You'll get all the consonants close to right on Latin and French, and
almost right in German (hard ch loses), and at least land close to the
vowels in Latin and French. (I have studied French heavily but German
only lightly, but what little German I've had a chance to listen to
wasn't that far off from what I'd sight-read. I don't know *where*
that German was from, however -- reputation is that the the divergence
between North and South Germany in pronunciation used to approach
Gaelic is hopeless (where did all the vowelless[sic] syllables come
from), and as mentioned at least half the vowels will be
systematically completely wrong in Old and Middle English.
IIRC, the two sounds are in thin vs then. Likewise, we have two
different 'l' sounds: laugh vs ball.
Thank you for the thorough treatment of the subject.
That's my theory precisely. By grapheme, I might even go a bit level
than a-z, and include sh, ch, etc, as graphemes. (Apologies if the
correct definition of grapheme already groups those...) For this name
generator, I was not aiming for *any* "feel" to the resulting names.
I just wanted a uniform sampling of all names that could be "read" by
an English speaker. I don't see any reason to disallow real words
Ha ha ha! Czech also has vowelless syllables, in fact, you can
sometimes form a fully coherent phrase without a single syllable.
Also, Portuguese has a certain level of vowel reduction, although only
in pronunciation. They ran a little test one day. People from Portugal
and from Sweden or Norway (can't remember exactly) listened to series
of words in Portuguese and counted syllables. The Scandinavians, on
the average, heard two syllables less than native Portuguese
speakers :). For instance, the Portuguese word "Excepcão" is
pronounced more or less "ishsã" (only two syllables instead of three).
Further reductions include certain consonant groups: mn, ct, cc, pc,
etc. They were reduced to single phonemes, sometimes even affecting
the ortography ("aqueducto" -> "aqueduto", pronounced "aekuedutu").
By analizing some Slavonic languages, I also reach the conclusion that
the vowels have different grades of disapparition in different
tongues. For instance, the Russian word "korol'" (king, two syllables)
is "król" (one syllable) in Polish and "krl" (one vowelless syllable)
in Czech. No wonder the Czech understand Polish quite well, but the
Poles don't understand Czech. Same happens with Spaniards and
Portuguese (the Portuguese don't bother putting subtitles to Spanish
movies, the Spaniards don't understand a word of what a Portuguese
And one last thing. Are we still talking about random name generators?
This is the syllable-based approach I referred to. I use this for
artifact names and unique creature names in POWDER. I like it because
you have a lot of control over the feel of the resulting names. My
only suggestion would be to eliminate your dependency on bit fields -
free the lists to be arbitrary lengths and forget about byte packing.
Your town name generator shouldn't be constrained by magical powers of
A longer sample set is needed if the names are short. If certain
syallables show up too often for taste, a little hand-tweaking of the
basis set takes care of it.
> On Jun 29, 12:51 pm, Ray Dillinger <b...@sonic.net> wrote:
>> We have two different 'th' sounds, one voiced and one unvoiced.
>> Although we think of these as the same, they are not, and this
>> is a point where English speakers face the same problem in other
>> languages that Japanese speakers face in English.
> IIRC, the two sounds are in thin vs then.
And in "think" and "these", which was my intended example, although
I probably should have pointed that out. :-)
A few years ago we had discussion on that topic with nice Markov chains
based implementation. Why to reinvent the wheel? :)
It's probably not an issue with the sort of generator you're using. In
the case of a generator that attempted to more closely approximate some
real language* using data based on that language, I'd think that the
danger of generating words that already exist would be higher. And I
don't want "You find a scroll labeled 'Ass Hat'" to be in my game. :)
* I have no idea why a person would want to do this except for the sheer
delight of over-engineering their random words.
> Paul Donnelly <paul-d...@sbcglobal.net> wrote:
>>> All three will be completely mispronounced if you try to sight-read
>>> the latin-alphabet form with any dialect of modern English. The Great
>>> Vowel Shift is much larger than the little deviations (broadly,
>>> Australian/American/Queen's English) since Shakespeare.
>>So will French or German be, or even Latin itself.
> Less badly than Gaelic, though. In fact, unless you actually *know* the
> rules of Gaelic orthography, it's close to impossible to pronounce a
> passage of Gaelic correctly given its spelling. You won't even get the
> consonants right, let alone the vowels.
I don't know about that, I routinely mangle French (although the one or
two Gaelic words I know indicate I would mangle Gaelic just as badly). I
doubt many non-English speakers could guess their way properly through
an English sentence either.
But that's really here nor there, as my point was that it's amusing to
say that any of those languages are difficult to represent with the
Latin alphabet, since that's what they are *conventionally* written
Thank you for the link. I rejected a markov chain approach as I
wanted a very wide list of available names not strongly biased by some
dictionary. In some ways, you can consider this to be a markov chain
with an implicit transition table.
The other problem I have with a letter-based markov generator is that
it is hard to add enough history to ensure you get valid pronouncable
names (ie, not too many consonants in a row) without resulting in a
degenerate generator that barely deviates from your sample set.
The last post in that discussion addresses this by layering an
additional restriction, essentially prohibiting triple consonants. An
issue I have with that is not all double consonants are necessarily
easy to pronounce - thus my inspiration from phonemes to classify the
consonants by pronounciation to force them to chain in the proper
fashion and return to a vowel swiftly enough.
I do like the idea of digging down to the phonemene level to
concentrate on building names, my main concern is that the orthography
you impose on your phonemes will likely have more effect on the names
than the actual phonemes you pick. ej vs edge is an excellent example
- my perfect name generator would produce both those options rather
than only one of them.
Oh, I don't think it's exactly reinventing the wheel. There are so
many things to discuss in this subject! One thing that has just sprung
to my mind: how do you make the names be masculine or femenine?
Still, thanks for the link. Made me want to rewrite my random scroll
title generator using Markov chains :).
> Oh, I don't think it's exactly reinventing the wheel. There are so
> many things to discuss in this subject! One thing that has just sprung
> to my mind: how do you make the names be masculine or femenine?
Perhaps a grouping of "hard" vs "soft" sounding words?
la- lau- lu- ver- -ra -isha -fi for start/ends of female names
kr- ka- to- pa- -rick -hat -kon for start/ends of male names
I'll simply create different sets of syllables. I think it's easier.
I got my name generator running (wrote it this evening). The syllable
base is taken from Eddings and Tolkien names (it still might need a
few tweaks, I'll have to have a closer look at all the generated names
and try to determine whether there's a syllable or two that doesn't
Here's a sample list of names (the generator outputs 100 names to a
Belregoyon, Isvon, Kidil, Rhoregoldil, Yaryon, Dagldir, Kidin,
Arthrin, Grinharamas, Hurdin,
Galromilen, Unratir, Mankaltholen, Amthel, Borduleg, Amduhor,
Samgoyon, Aerrach, Breglatharan,
Todunara, Brendar, Chaldor, Hardig, Zedir, Besus, Erihor, Hathhir,
Fulmas, Baldin, Hanrig, Unhek,
Ulflathavor, Halzilagund, Camdas, Urthntir, Balruirig, Anharavon,
Manrominik, Dunarilach, Bogorn,
Branbar, Samgorn, Gethad, Erdunahir, Radhgahek, Dairroddamas,
Ragmalen, Lelzilarath, Gontir,
Bregruidur, Grinlathaneg, Hurgund, Yarroblek, Altar, Elenlathadas,
Malra, Nadan, Isruihell, Kihir,
Elenkan, Barregorin, Cherain, Hanrorig, Badur, Isregobers, Toneg,
Dagziladin, Aerhor, Yarlethel,
Chalmamas, Ragbar, Rellin, Bodurig, Drobar, Zavor, Chalrain,
Borlathaldir, Samdrakan, Camrand,
Brodsta, Runhell, Hathharagen, Gillen, Mafast, Urthkor, Gilsus, Todar,
Erizilavon, Harderek, Balathasin, Malgas, Dairharadur, Gotoldir,
Turtoran, Dromnir, Hathrevon,
I'll try to get the female name generator running, then maybe I'll use
demon names or something :). Now, generating race-dependent names
might be trickier (dwarves, elves, orcs...).
I had success using markov chains following transition between letter
pairs, rather than single letters. This solved most of the issues with
triple+ letters. You're spot on with needing a fairly large data set
for the short names.
With my name generators I extract the transition tables from name
lists of different languages. It works well.