Machine learning for a conversational system

5 views
Skip to first unread message

Mike Rozak

unread,
Nov 27, 2003, 9:54:55 PM11/27/03
to
In a prior discussion I commented that I didn't think that OO programming
was necessarily the best way to impliment a NPC conversation. Last night I
woke up with a partial explanation why. The solution involves machine
learning. It's a lot of work, and even with the basics that I fleshed out,
it's probably only 10% of the whole conversation puzzle. I don't expect to
code this anytime soon (definitely not in the next few years) but I thought
I'd give a brain dump in case anyone has copious amounts of spare time.

First of all, let me state that I'm a a fan of "machine learning". Let me
illustrate with an example:

To create a speech recognition system you go out and record several hundred
people reading various sentences. In all you end up with (for this example)
10,000 recordings and their transcriptions. You take 5,000 recordings and
use them as your "training database" and the other 5,000 as a "test
database".

To machine-learn speech recognition, a programmer writes a pattern-matching
algorithm that learns what patterns to look for by examining the "training
database". (I can explain the pattern matching in more detail if you wish.)
That's all the human intervention in the system. (Kind-of). (One useful
result of this pattern matching is that if you want speech recognition in
French instead of English, you get 5,000 recordings of people speaking
French and pass those into the pattern matching software instead of the
English sentences. The algorithm doesn't care.)

The efficacy of speech recognition is measured by it's "accuracy" rating...
how many words it gets right vs. how many it gets wrong. You'll find that if
you do an accuracy test against the training database, speech recognition
will have a 90% accuracy. But, if you pull the test database out of the
vault, it will only have an 80% accuracy. This is because some of what the
pattern matching algorithm learned was specific to the data it learned from.
This is not desired, but it always happens.

Many decades ago the speech recognition community used a different approach;
they had a human look through the training database and write code
specifically for recognizing different phonemes. For example: To recognize a
"t" you might look for a sudden burst of energy, "f" look for noise, etc.
This is a lot of work, and it takes much longer to for a person to write
code that recognizes all the phonemes using human-generated algorithms, than
it takes for a person to write a pattern recognizer that automatically
learns what the phonemes sound like.

The proof is in the pudding/statistics... If you run the training data
(which the humans examined when coming up with their own algorithms) through
the hand-written system you get 95% accuracy, higher than the automatic
pattern recognition. That's great. However, when you run the test database
(which the humans have never seen) you unexpectedly get only 60% accuracy,
much worse than the machine learned system.

The reason is that the hand-tuned code is "brittle". It works great for the
cases that its builders anticipated, but fails miserably when it sees
something it's human builder's haven't.

IF text parsers have the same problem, especially when dealing with
conversations. One aspect of this problem is the old "guess the verb" issue.

That's why I like machine learning. Ultimately (for realistic
conversations), it's easier to program, easier for the author, more
localizable, and more robust. BUT, it isn't obvious to me how to use machine
learning for a conversation.


Here's an idea to start with. As I said earlier, it's a lot of work, and
only partially solves the problem.

First, simplify the problem from "conversation" to a "query"... this isn't
that much of a leap since NPC interaction tends to be more on the query side
anyway: "What do you know about the riddle of the sphinx?", "How do I open
the door?" etc. Very few NPCs will answer and then follow up with "Speaking
of sphinxes, how is your mother doing?"


Step 1:

The author comes up with 100 possible questions that the player might speak
to the NPC, along with the responses. The author indicates the keywords by
capitalizing them, such as "What do you KNOW about the RIDDLE of the
SPHINX?". Alternate versions of the question can also be typed in: "What is
the ANSWER to the RIDDLE of the SPHINX?" Also, a synonym table can be
created "sphinx" = "statue", etc. The user may even indicate which words
have nil content, such as "please", "the", "of", etc. (Note that the
synonyms and content-free words are generally (but not always) constant for
all conversations within the language. Thus, the database can be created
once and used for all NPCs and/or all IF. That's the theory. In practice the
database of synonyms will vary somewhat with each IF title.)

So far this isn't much different than OOP.


Step 2:

When the user types in a query "What is the riddle of the sphinx?" this is
compared against ALL the sample sentences entered by the author. (Note:
Optimizations will mean that not ALL sentences will be examined.)

The comparison generates a "score" for each sentence, the higher the score,
the closer the match between what the user typed and the sentence. To
determine what sentence the user said, just take the result with the highest
score. If the highest score is really awful then reply "Sorry, I didn't
understand that." (This is basically what speech recognition does.)

You can compute the score by a) seeing what percentage of words match
between the user's sentence and the sample sentence. b) See if words are
moved out of order and by how much. c) Look for synonym substitutions. d)
Look especially hard for keywords. e) Etc. Each word match counts for N
points, a word flip removes M points, using a synonym sustitution removes P
points, etc.

The number of points added/removed for a case are machine learned through
standard techniques. (Neural nets would work, but the term is so overused I
don't even want to get started.)

If you test this system it will work, but you'll find that hand-tuned OOP
will have a higher accuracy (aka: work better).

Step 3:

When beta testers play the IF, keep a log of everything they say to the
NPCs. The beta testers E-mail this to the author (this is more difficult
than you'd expect since people are worried about privacy).

The author then runs a program that loops through all the querries that real
players typed. Using the algorithm in step 2, the program displays a menu of
the top-4 sample sentences that match. (The chances are pretty good that if
the NPC knows about the topic that it will be in the top 4. As an aside,
even though speech recognition only has a 90% word accuracy, there's a 98%
of the word being in one of the top-10 candidates that the system was
considering). As a result, the author merely presses 1,2,3, or 4 and the
player's sentence is automatically ADDED to the sample sentences. In the
unlikely event that the correct response was not in the top 4, the author
can search through the list of 100 responses and pick one. Or, the author
can rephrase the user's question until the system gets it right, and then
not only add in the original question, but all the rephrasings.

If the questions requires a new response the author can add it. Thus, the
author can come up with responses for questions he never thought players
would ask.

The newly expanded list of sample sentences (the "training database") is
recompiled into the IF and sent out. Repeat process as long as you wish. It
will only get more accurate. (Actually, the accuracy will asymtote out, but
I suspect by that point the author won't care.)

One other nifty thing about collecting all the sample sentences is that the
IF construction tool can home up with a probability that a user will ask a
specific question. For example: "What is the riddle of the sphinx?" may be
asked a lot more than "Where's the nearest toilet?". This probability can be
included in the score to make the system even more accurate. Plus, the
author, knowing that a certain question is being asked a lot, can fine-tune
the NPCs responses based on that.


Step 4:

Unforutunately, there is no state with this query system. If the user asks
"Who is the king?", gets a reponse "It's King Leopold", and then types in
"Where is HE now?" the system won't have a clue.

This can be solved in a number of ways:

a) Keep track of pronoun candidates and do automatic pronoun substitution.
"Where is HE now" would automatically be translated into "Where is the king
now?" before being passed to the sentence scorer. (A somewhat tricky coding
task.)

b) If the NPC asked a question like "What is your favorite color?" and the
player responded with "Red" then uses Eliza's tricks to resynthesize the
sentence as "My havorite color is red". (Even trickier to code.)

c) Do what existing systems do and incorporate a state. Set a flag that the
NPC just asked for a favorite color and it should expect one of the
following 10 responses. BUT, rather than having a menu, just automagically
increase the scores of the 10 expected responses. Ex: The user's input of
"red" would normaly get a score of 10 against the sample sentence "My
favorite color is RED". Just after the NPC asks the color question, up the
score by 20 (or whatever) points for the expected responses. Then, if the
user answer "red" after the NPC asked "What is your favorite color?" the "My
favorite color is RED" response will have a score of 30 (instead of 10).
(Technically, this is using the state of the conversation to influence the
expected probability of a given response. Earlier I noted that probability
can affect the score.) Unfortunately, this requires more author intervention
to hardcode.

d) The ideal solution is to learn the probability of a response based upon
the previous queries. If you could collect enough data (which you can't) you
could learn that after the NPC asks "What is your favorite color?" the
probability of the "red" response is 10%, "blue" is 15%, "green" is 25%, and
"Where are the toilets?" is 0.001%. Speech recognition does this to enable
general purpose dictation. However, it requires a LOT of data, about 1
BILLION words of text for dictation. It ain't gonna happen for IF - although
it might work for a MMORPG. Well, it might work for IF in very broad terms:
You can be pretty sure that the first utterance in a conversation will be
"hello" or "hi", and that the last one will be "bye" or "cya", etc.

Step 5:

I'm not really sure what's next, but there certainly is an awful lot. For
example, even if you could automagically learn the probability of a response
based on previous ones (which you can't), the following problems need to be
solved:

- The context of a conversation extends back in time more than one or two
sentences, sometimes it goes back years. (Unfortunately for me, years ago I
mentioned to some people when my birthday was, and they still remember it. I
don't remember theirs though.) You'd have to write OOP code to handle this
kind of behaviour.

- The machine learning doesn't handle fill-in-the-blanks such as "My name is
Mike" or "My name is Fred", etc. I suspect the algorithm could be extended
for this though, but it would be easier to use OOP to code it.

- The NPC's responses will vary depending upon their mood. Sometimes they
will be more verbose than others. This means typing in multiple responses
and writing code to determine which one to use. Yuck. (Luckily, the author
will know what questions are most commonly asked and can put most of his/her
work into providing alternate responses for these.)

- You may want to use the same response for all NPCs (such as "Who is the
king?" or "What time is it?") BUT have them worded subtly different based
upon the NPCs speaking conventions. This either requires typing in multiple
responses, OR if you're really clever you could make a translator. Years ago
I heard there was a program that converted normal english text into "Jive".
Likewise, you could do a Canadian translator by adding "eh?" onto the end of
every sentence, or Australian by using "mate" a lot and converting "Hello"
to "G'day". (OK. So this example is dealing in stereotypes. Although most
Canadanians do use "eh" a bit too much, the Australians I know don't go
around saying "G'day" to everyone. I just wanted give an obvious example.)

- I'm sure there are others issues as discussed in posts over the years.


Step 6:

What I described was only for querries. It doesn't handle conversations. To
get a conversation you need to have the NPC ask questions back. This
involves following a conversation path, NPC motivation, etc. I doubt this
could be machine learned. It has to be coded. (It could be coded in a
"fuzzy" manner though, instead of absolute OOP.)

Although, now that I think of it, if you had a transcript of a player
talking with the author (who was pretending to be the NPC) you might be able
to get statistics on this. It would be too much work.


Sound complicated enough?

--

Mike Rozak
www.mXac.com.au


Quintin Stone

unread,
Nov 28, 2003, 2:42:55 PM11/28/03
to
On Fri, 28 Nov 2003, Mike Rozak wrote:

> Sound complicated enough?

Yes.

/====================================================================\
|| Quintin Stone O- > "You speak of necessary evil? One ||
|| Code Monkey < of those necessities is that if ||
|| Rebel Programmers Society > innocents must suffer, the guilty must ||
|| st...@rps.net < suffer more." -- Mackenzie Calhoun ||
|| http://www.rps.net/QS/ > "Once Burned" by Peter David ||
\====================================================================/

ems...@mindspring.com

unread,
Nov 28, 2003, 8:22:34 PM11/28/03
to
"Mike Rozak" <Mike...@bigpond.com> wrote in message news:<3Oyxb.30258$aT....@news-server.bigpond.net.au>...

Just a couple of comments:

> Step 3:
>
> When beta testers play the IF, keep a log of everything they say to the
> NPCs. The beta testers E-mail this to the author (this is more difficult
> than you'd expect since people are worried about privacy).

I guess it might be a problem to get logs out of random players, but I
pretty much always ask for complete transcripts from my beta-testers.
It's far more useful than collecting their comments (since sometimes
the game produces behavior that they don't consider buggy, but that I
would like to change anyway), and no one seems to mind.

> Step 4:
>
> Unforutunately, there is no state with this query system. If the user asks
> "Who is the king?", gets a reponse "It's King Leopold", and then types in
> "Where is HE now?" the system won't have a clue.

The lack of statefulness is, in my opinion, a fairly serious problem
with this system (at least, as I understand the description thus far).
When I write full games with NPCs, I want them to have some sort of
part in an evolving story, not just to be banks of information like
very fancy encyclopedias; and that means that there are different
things appropriate to say at different times, and different reactions
they're likely to have depending on how things have gone so far.

I don't want to sound too negative, though. I think this is an
interesting kind of research to pursue, but I have yet to see any
chatterbot, programmed in any language, with any amount of training
time, that I thought would be a good addition to an IF game. Some of
them are much more sophisticated conversationalists than the average
IF NPC, but all they're really capable of having is a small-talk sort
of conversation; I haven't seen any that would be good in a situation
with an evolving plot.

Now, if this could somehow be yoked intelligently to the idea of NPC
goalseeking discussed recently in another thread, so that NPCs would
be able to formulate conversation in order to try and get what they
wanted...

Well, then you'd really have something, and not just for IF.

Mike Rozak

unread,
Nov 29, 2003, 7:35:01 PM11/29/03
to
> The lack of statefulness is, in my opinion, a fairly serious problem
> with this system (at least, as I understand the description thus far).
> When I write full games with NPCs, I want them to have some sort of
> part in an evolving story, not just to be banks of information like
> very fancy encyclopedias; and that means that there are different
> things appropriate to say at different times, and different reactions
> they're likely to have depending on how things have gone so far.

Agree. I suppose I forgot to mention that there are (at least) two types of
state. The one I was talking about (which I didn't make clear) was the
"current topic/subject" being discussed. The other state, as you point out,
is the NPCs responses changing as the story progresses. I can't see any way
to do the second without coding. Even the current topic/subject will require
some coding.


> I don't want to sound too negative, though. I think this is an
> interesting kind of research to pursue, but I have yet to see any
> chatterbot, programmed in any language, with any amount of training
> time, that I thought would be a good addition to an IF game.

Agree with this too. What I described would be a heap of work and wouldn't
go far enough. It's definitely a research topic, and as you point out, has
implications far beyond IF. I suspect that the gist of what I described
(machine learning) is an important direction to research, and that
utlimately (10-20 years) at least some machine learning will be incorporated
into every conversational system.


--

Mike Rozak
www.mXac.com.au


M.D. Dollahite

unread,
Dec 1, 2003, 7:54:32 AM12/1/03
to
Well, as interesting a line of research as this is, I see some issues that you
may not have considered. What you're talking about right now sounds more like
a development tool rather than part of the run-time conversation system. Once
you've finished your beta testing phase and release the game, the machine has
learned all it's going to learn, and the resulting behaviour from the player's
perspective isn't any different from T3's OOP approach. Does it make any real
difference how the database was actually generated if the result is the same?
I think most people would have an easier time setting up the OOP database than
using the machine learning tool. Now, I was just mentioning a few days ago in
the graphics thread about getting the computer to do the work for you, but in
order for it to be productive you need to be converting a few straightforward
parameters into lots of complicated code, whereas it looks to me like a machine
learning system would require a lot of complicated parameters to generate
relatively straightforward code. A lot of the stuff you talk about -- sentence
parsing, match scores, pronoun resolution, topic-of-discussion, etc -- is
already done by the TADS 3 system, I think all you're really changing is the
way the query/response pairs are collected and stored. And as that TADS 3 tech
article I referred you to mentions, no matter how many questions you and your
testers think of during beta development, players will think of thousands more
that never occurred to you.

The one possible improvement I see over the OOP system is that you can
theoretically deal with more naturally-worded questions more of the time.
Whether that's actually an improvement or not is debatable, as natural language
conversations represent a break from the usual command format. The normal
command system gets players accustomed to a routine where they give orders and
the PC performs actions; then they encounter an NPCs and suddenly they're
performing the actions themselves instead of through the PC proxy, and that can
be confusing.

If you're thinking of a system that continues to learn from players even after
final releases, be warned: I've seen many a chatbot -- some even programmed by
professional AI researchers -- reduced to a babbling vegetable the instant some
moron decided to flood its database with a bunch of garbage just to see what
would happen. Learning computers tend to be rather brittle that way; they need
lots of creator supervision to keep the database from being mangled by
malicious users.

I think attempting to apply machine learning to IF is in general a great idea.
However I think it will take a few decades of pretty dedicated research to come
up with something that is functional and enough of an improvement to justify
the added complexity. The system you describe here sounds to me like little
more than a novelty for developers.

Quintin Stone

unread,
Dec 1, 2003, 9:21:04 AM12/1/03
to
On 1 Dec 2003, M.D. Dollahite wrote:

> I think attempting to apply machine learning to IF is in general a great
> idea. However I think it will take a few decades of pretty dedicated
> research to come up with something that is functional and enough of an
> improvement to justify the added complexity. The system you describe
> here sounds to me like little more than a novelty for developers.

It sounds to me like a research project for grad students. Not something
feasible for amateur game makers.

Mike Rozak

unread,
Dec 1, 2003, 8:15:29 PM12/1/03
to
> Once
> you've finished your beta testing phase and release the game, the machine
has
> learned all it's going to learn, and the resulting behaviour from the
player's
> perspective isn't any different from T3's OOP approach. Does it make any
real

Correct, the system wouldn't be able to learn after beta. (Unless it can
continually provide feedback to the author, such as in a MMORPG.) Learning
from the user's inputs alone (without human intervention) would require
unsupervised machine learning, which as you discussed later, is very
dangerous, especially if the user wants to mess things up. (Even speech
recognition systems are very cautious about this.)

As far as making any real difference: I think that given the small sope of
contemporary NPC conversations it doesn't make any sense at all to include
machine learning. I was just thinking out 10-20 years. 10 years isn't so far
out, because if I decide to unterake a graphical IF system I want an
architecture that will last awhile. I haven't yet figured out how a
machine-learned convesation system would affect the architecture, though.

My thoughts proceed as follows:
1) Back in the mid 1980's I wrote several adventure games in basic or
pascal. They worked ok.

2) When I looked over TADS/Inform, I noticed how nicely the OOP language
correlates to the rooms, game objects, and parsing. While I was able to
write an aventure without using OOP, it would have been much easier with
OOP.

3) If I think that machine learning is the way to go, then what kind of
language would facilitate both the machine learning and pattern matching?
While OOP will work (just as basic/pascal would), there might be something
better. I'm not sure what this is, but it's worth a ponder.


--

Mike Rozak
www.mXac.com.au


Reply all
Reply to author
Forward
0 new messages