I'm intimately familiar with how Alexa and Assistant work.I've spent the last four years become a "world class expert" on the subject, spoken at conferences, written books, etc, etc. And, yes, unique identification is not an issue. The issues, as I see them, are what I listed.
It's (1) converting a call-and-response REST style of asynchronous communication to a synchronous Telnet style that LambdaMOO uses. From my much more limited (and ancient) knowledge of LamdaMOO, that will require more serious surgery on the part of the server.
And (2) voice is not precise. The design model for both Alexa and Assistant requires you to pre-declare your vocabulary up front. That, obviously, isn't going to work for a dynamic user-content-driven-system like LambdaMOO, so I have to use some tricks to turn off Alexa/Assistant's NLP and leave it as a sort of simplified Speech-To-Text machine. It's only about 90% accurate for STT though, which sounds great, but means that it gets one word in ten wrong. So fuzzy matching is required, which is fine. I've got some pretty comprehensive libraries for that I've developed over my years of voice work. But, to make all that work, I need to be able to know, per request, what the current set of vocabulary is. Currently the command parser iterates over the objects in context, and over each verb on those objects, and selects the first match. I need to change this to iterate over everything, compute how closely each possibility matches, and then choose the match with the highest confidence (or error out if there isn't one match that stands out from the rest, or gracefully degrades if there are two or more close matches, etc, etc.) That, again, is something that seems to me would require fairly major surgery on the server side.
If anyone can give me a reality check on how difficult either of the above is, or if anyone has tried doing a patch like that already to LambdaMOO, that would be illuminating. Or, my biggest stumbling block to a re-write is implementing MOOScript in Java. If someone has re-implemented the language in that way I'd like to know their experience there too.
Right now my wife (my boss for voice stuff) is pushing me to just do an extremely cut down version of LambdaMOO as a test trial. We've done interactive asynchronous stuff with SubWar (sort of a 3D voice battleship) and an expansive geography based fantasy game Six Swords (basically D&D via voice). The latter has some primitive features for adding user content. What I really want to do is have the interaction of SubWar, but social rather than warfare, but the expansiveness of Six Swords with better user content options. General content creation via voice is never going to work well. I've been trying that for a few years. But a hybrid system where, essentially, regular players interact via voice, but there is also a telnet-like interface where programmer-bit players can also work would enable that sort of thing. My most enthusiastic users are visually impaired. They've been playing telnet based MUDs for years. If I can enable them as content creators to drive the mainstream voice content, that would be perfect.
And this is why I turned back to LambdaMOO. It has many of the features, capabilities, and history that would serve such an aim well. There are just these humps to get over.
Cheers,
Jo