Fronting LambdaMOO with voice

1 view
Skip to first unread message

Jo Jaquinta

unread,
Feb 11, 2019, 11:57:11 PM2/11/19
to MOO Talk
Hey Folks,
    I did a lot of LambdaMOO work back in the 90s just after college. Lately I've been doing voice apps for Amazon Alexa/Google Home. Right from the start I felt that a MOO sort of back end would be a very good solution for creating a user content driven extensible voice experience. I've looked at it off and on over three years. Now, a bit more seriously than previously.
    There are two main issues I face with using LambdaMOO directly.
    1) Voice is asynchronous. LambdaMOO is kind of based on having a pipe that you open and stream stuff up and down. Sure, I might be able to fake it into asynchronicity with a kludge around it. Or I could hack the whole way that network sockets are tied to sessions. But I'm not sure how much I'd have to end up changing if I started down that route.
    2) Voice is imprecise. Some level of fuzzy matching is going to be necessary. One of the huge advantages of LambdaMOO is that, at any point, you should be able to work out from the current context all the commands that could be given. And that can be used as a basis to match against what was actually said. But LambdaMOO isn't really currently structured like that. I'm not sure how much I'd have to rototill to get that functionality.

    One approach I have been looking at is doing a basic re-write of LambdaMOO from the specs in a more modern language (Java). With a more recent MVC approach the interaction part could be done as a layer which would allow multiple asynchronous or synchronous front ends. It would be pretty trivial to change the scripting language to Javascript, which is much more consumable for a modern audience.
    The biggest stumbling block in this approach is to support MOOScript or not as well. It's probably nearly as much work to do that, as to do the rest of the code. But if I don't do it, I'd have to create a core from scratch or try to port all the verbs in a core to Javascript.

    Looking for thoughts from people. I'm not really conversant in all the patches and forks that have happened in the last 15 years or so. I don't want to re-invent the wheel if I don't have to.

       Cheers,
              Jo

Todd Sundsted

unread,
Feb 12, 2019, 2:37:31 AM2/12/19
to MOO Talk
Killer idea! But I don't think you have to rewrite to get it off the ground. Amazon Alexa and Google Assistant both ultimately send POST requests to backend services with voice to text parsed data in the payload and some kind of user identification. Use the user identification to establish session continuity for the user, and pass the command to existing moo parsing code or something custom.

Todd

Jo Jaquinta

unread,
Feb 12, 2019, 8:41:09 AM2/12/19
to MOO Talk
I'm intimately familiar with how Alexa and Assistant work.I've spent the last four years become a "world class expert" on the subject, spoken at conferences, written books, etc, etc. And, yes, unique identification is not an issue. The issues, as I see them, are what I listed.
It's (1) converting a call-and-response REST style of asynchronous communication to a synchronous Telnet style that LambdaMOO uses. From my much more limited (and ancient) knowledge of LamdaMOO, that will require more serious surgery on the part of the server.
And (2) voice is not precise. The design model for both Alexa and Assistant requires you to pre-declare your vocabulary up front. That, obviously, isn't going to work for a dynamic user-content-driven-system like LambdaMOO, so I have to use some tricks to turn off Alexa/Assistant's NLP and leave it as a sort of simplified Speech-To-Text machine. It's only about 90% accurate for STT though, which sounds great, but means that it gets one word in ten wrong. So fuzzy matching is required, which is fine. I've got some pretty comprehensive libraries for that I've developed over my years of voice work. But, to make all that work, I need to be able to know, per request, what the current set of vocabulary is. Currently the command parser iterates over the objects in context, and over each verb on those objects, and selects the first match. I need to change this to iterate over everything, compute  how closely each possibility matches, and then choose the match with the highest confidence (or error out if there isn't one match that stands out from the rest, or gracefully degrades if there are two or more close matches, etc, etc.) That, again, is something that seems to me would require fairly major surgery on the server side.

If anyone can give me a reality check on how difficult either of the above is, or if anyone has tried doing a patch like that already to LambdaMOO, that would be illuminating. Or, my biggest stumbling block to a re-write is implementing MOOScript in Java. If someone has re-implemented the language in that way I'd like to know their experience there too.

Right now my wife (my boss for voice stuff) is pushing me to just do an extremely cut down version of LambdaMOO as a test trial. We've done interactive asynchronous stuff with SubWar (sort of a 3D voice battleship) and an expansive geography based fantasy game Six Swords (basically D&D via voice). The latter has some primitive features for adding user content. What I really want to do is have the interaction of SubWar, but social rather than warfare, but the expansiveness of Six Swords with better user content options. General content creation via voice is never going to work well. I've been trying that for a few years. But a hybrid system where, essentially, regular players interact via voice, but there is also a telnet-like interface where programmer-bit players can also work would enable that sort of thing. My most enthusiastic users are visually impaired. They've been playing telnet based MUDs for years. If I can enable them as content creators to drive the mainstream voice content, that would be perfect.
And this is why I turned back to LambdaMOO. It has many of the features, capabilities, and history that would serve such an aim well. There are just these humps to get over.

Cheers,

Jo

David Given

unread,
Feb 12, 2019, 9:35:44 AM2/12/19
to Jo Jaquinta, MOO Talk
I can't comment on how complicated it would be to rework the parser, but it doesn't sound too difficult. For matching each word there'll be a set of available candidates available. It should be a matter of changing the selection algorithm from exact-match to best-match. Whether the parser restricts the set of available candidates to just the ones allowed by the grammar or whether it just has a big bag of words, I don't know.

I do know that reworking it to use best-match is also of value to ordinary plain text because it allows helpful 'I don't understand spanner in this context; did you want to hit the rock with the hammer instead?' responses.


--
You received this message because you are subscribed to the Google Groups "MOO Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to MOO-talk+u...@googlegroups.com.
To post to this group, send email to MOO-...@googlegroups.com.
Visit this group at https://groups.google.com/group/MOO-talk.
For more options, visit https://groups.google.com/d/optout.

Chad Hill

unread,
Feb 12, 2019, 11:54:07 AM2/12/19
to MOO Talk
I've seen several moo-based web servers over the years.

As long as you can carry a value from request to request (can you have a session?), you can make a MOO that responds like a web server, each request is a connection, transmission of request and then listening for a response until some end token is transmitted. You can even do basic auth if you need it. You can probably use one of the already existing DBs that has an $http_server of some kind. 

You might have to write your own matching $util to work like you want, but its not overly complicated moo to write something that does it how you want.

-- 
Chad Hill
Sent with Airmail

Brendan B

unread,
Feb 12, 2019, 9:31:19 PM2/12/19
to MOO Talk
I looked into doing something with this a while back with Alexa. This biggest issue was pushing info to the alexa when something happened on the MOO that deserves attention (think someone saying something to a character, or entering a room). At the time Alexa didn't support this kind of push interaction. It's been 2-3 years and my info is out of date in terms of if they allow that now-- or if it's relevant to what you are doing. But I'd be interested to hear if it's possible now.


On Monday, February 11, 2019 at 11:57:11 PM UTC-5, Jo Jaquinta wrote:

Jo Jaquinta

unread,
Feb 12, 2019, 11:01:20 PM2/12/19
to MOO Talk
Hey Brendan,
I'm afraid Alexa still does not (effectively) have push notifications. At least not that could be used for what you are thinking. Nor does Google Home. Nor, in my opinion, are they ever likely to do so. There are too many security implications.
However, that doesn't mean you can't do an asynchronous thing. If you have an Alexa (or pixel phone) you can run my Sub War app as an example ("Alexa Open Sub War" or "OK Google, talk to Sub War"). That's the battle-ship like game. But it's multi-player, with all players moving asynchronously. The key is to queue up messages that players are due to get. Then, when the connect for a command, the first playback they get is their messages, followed by the immediate results of their command.
Not perfect for the social-chat like interface I'm looking for. But I figure if it's also a geography to explore, if there aren't people to chat with about, people can wander and look or build. If people do turn up, they can chat. (Or just sit there and say "listen" every 8 seconds to follow the flow.)
Cheers,
Jo

Jo Jaquinta

unread,
Feb 13, 2019, 11:00:29 AM2/13/19
to MOO Talk
On Tuesday, February 12, 2019 at 9:35:44 AM UTC-5, David Given wrote:
For matching each word there'll be a set of available candidates available. It should be a matter of changing the selection algorithm from exact-match to best-match.
I've written several hundred thousands of lines of C code in my lifetime, but most of it was more than a lifetime ago! Threading through the MOO code...

So, for dobj/iobj, the crux seems to come down to match_proc() in match.c. This seems to be where the character-by-character comparison is done. It already has logic to decide between an exact match and a partial match, so that is good. I'm not quite sure the logic there distinguishing between a partial match and an ambiguous match. I could probably extend the structure used here and add in a floating point "confidence"  and do my fuzzy match at this level.

However, the surrounding logic (in match_contents in match.c) is a bit linear. But I guess for dobj/iobj, it's OK. There isn't any concept of "nearness".

The verb matching is done later. Not sure why. But it seems to come down to a number db_find_XXX_verb() functions in db_verbs.c. These, and the logic around them, appear to be all-or-nothing. So partial/ambiguous/fuzzy matching would have to be introduced up and down the chain here. On the plus side, since there is nothing there already, the confidence based structures could be added from the start without worrying about how it plays with existing logic. On the minus side, it's a chunky change to a critical area of the program.

Has anyone done patches in this area? I saw that there were some ports to support Unicode. That might touch on these areas.

Jo
Reply all
Reply to author
Forward
0 new messages