Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Speech
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Timothy Rue  
View profile  
 More options Dec 28 1997, 3:00 am
Newsgroups: comp.sys.amiga.misc
From: "Timothy Rue" <tim...@mindspring.com>
Date: 1997/12/28
Subject: Re: Speech

On 27-Dec-97 06:46:13  <-AD-> wrote:

>These were the words of <Timothy Rue> on 26 Dec 97 19:29:54 -0500:
>> On 25-Dec-97 01:28:37 Vanilla Gorilla <v...@monkeyshines.com> wrote:
>> >On 24-Dec-97 05:27:00  wrote something like this....
>> >>These were the words of <bri> on Tue, 23 Dec 1997 20:03:10 -0000:

>> >>> Im looking for some software for voice recignition on the amiga.  Is
>> >>> there any software out there and if so then where, as i have looked in
>> >>> aminet and a few other sites, but no luck so far.  Also is is possible
>> >>> to get a reasonable sounding voise out of my amiga as "SAY" sounds
>> >>> completely crap.

>> >>There's a simple voice recognition type thing on Aminet which does CLI
>> >>commands by speech input with a sampler. I played with it a couple of
>> >>years back, and it worked OK. Not very useful, but it was kind of neat
>> >>doing Star Trek type stuff: "Computer, Format drive dh0: name..."

>> >> util/misc/VS121.lha  (may be more recent version, didn't check)

>> >there is a more recent version VS122, also there is VoiceShell
>> >1.33. Don't know if it is the same app or or a different one.

>> >>--
>> >>    <-AD->   <morse 'at' ahab.demon.co.uk>

>> Am I correct in believing these apps have a vocabulary limit and are also
>> user specific?
>As I remember it, there isn't an actual limit to the size of the word
>dictionary given in the docs, but the more words you add, the slower
>the recognition becomes. You can set a value for the 'strictness' of
>match between what you say and what is in the dictionary, and this
>affects speed also. When I was playing with the software, it worked
>reasonably well on a friend's spoken commands after being programmed
>with my voice.
>I recall the system using a library (voice.library) for the
>recognition part, so I imagine this could be used to base other speech
>recognition applications on.
>Does this mean that VIC is going to include speech input?

No, but this doesn't exclude the VIC from being used to improve speed of
such voice command software.

The key thing to understand about the VIC is that it is not to become
bloatware. It's evolutional direction is to be just the opposite
(streamline-ware). To be such a set of fundamental functionality that
supports Virtual Interaction. It is through integration (team work) of
it's primary functions that we achieve additional functionality and power.
Changes in its specs are only to be done in order to better integrate
(improve the team work benefits) so it can handle "any" exceptions.
Finding exceptions it cannot handle identifies something in need of
improvement. Of course the target objective if the VIC is to provide a
tool having ultimate versatility in it's ability to integrate.

--

You said the voice applications slows down as the word dictionary grows.
This is a search problem where applying constraints can help improve
speed. Having the ability to change the constraints as you go along, can
bring additional speed. No need to search what you know is not going to
match. By narrowing down (you might call this focusing) the search based
on input sequence, speed might well increase as you go along (to reach a
given objective).

This might sound complex if your thinking of pattern matching voice
patterns. But there are several ways of thinking about and doing, all with
the same underlying concept of applying and changing constraints as you go.

To get a easy idea take a look at a thesaurus, the front section. There is
the "Plan Of Classification" and "Tabular Synopsis Of Categories". These
provide you with what you might see as a search tree. Through these you
can quickly find a word of what meaning you want and without searching
the whole book.

Now let's apply this to the voice command applications, but not on the
primary level of voice patterns (this would be to difficult for most of us
to organize into a search tree). But we can do it with changing word
dictionaries.

First word: Computer (so it knows we are talking to it.)

Second word: Utilities (for the group application type dictionary)

Third word: Format (the application we want as well as it's dictionary)

Fourth word: "D" (for the drives dictionary)

Fifth word:  "F" (for the floppy dictionary)

Sixth word:  "0" (identifying the specific drive)

At this point we seem to have gotten ourselves into a corner, stuck in the
floppy dictionary, unable to complete the command. Although we could have
just included in the format dictionary all the things format might be used
on (which might including text files) this problem is worth a better look
at.

We know other actions might be performed on DF0 (i.e. backup, cd, reorg,
etc..). Instead of duplicating or reinventing this data in each
applications dictionary, it's better to simply provide specific
dictionaries and ways of getting to them within dictionaries needing to.
In this case, within the format dictionary, it's "D" (which is set/defined
to get the drive dictionary, though "D" might mean something else in
another dictionary).

Because DF0 can be used in so many ways, we cannot define it's (or "0")
definition action to go back to the format dictionary. But we do know that
DF0 is an arguement to something, in this case the format utility. And
what we are really doing is creating a command line to execute. So we put
this into a the contents of a variable, as is done with each element.

We have worked our way down in dictionaries and now we need to get back up
to the format dictionary and eventually back up to the dictionary
containing the word "Utilities" in order to execute the command.

But HOW do we "get back up"?

Cycles within cycles within cycles, etc..  Think of it this way:

Many years ago (while studing programing in college) I came up with what
has got to be the simplest program flowchart method of all. Circles within
circles. Draw a circle and within this circle draw a smaller circle that
touches the outer circle a some point. Consider the outer most circle of a
program it's "main" function and the smaller circles that touch it
"sub-routines". Of course and circles within "sub-routines" that touch it
are sub-routines of these "sub-routines", etc..

A cycle is made up of a sequence of actions and the point where an inner
circle touches an outer one is it's entry and exist points. You might see
the OS as the biggest circle for applications but the real biggest circle
is the computer on/off switch. :)

With the above we can see that we need to create and use cycles. Cycles
we can enter into and exit out of. How do we do this?

Scripts: the basic element of a user created cycle is the sequencing of
actions in a script. Exiting a script or cycle is as simple as just
coming to the end of the script or such command ending the script.

By having scripts executed within scripts, inherently "getting back" is
simple. In the VIC this is done thru the SF line in the PK file, a stack
of filenames and line-numbers, keeping the overhead of script file/line
sequencing down to a minimum.

Back to our format task:

Ok, we have established which "drive" to "format" and only need to exit
any inner scripts within the format script and set the voice dictionary
being used back to the format dictionary.

Q: What's this format script? How do we know that the users voice input
   is following this script?

Lets' recap for a moment:

The word "computer" gets the system ready for input (gets it's attention
on mic input, other than the attention getting word "computer").

To keep things simple and direct, we will have the word "computer" also
triggers off the VIC AI command, setting the VIC up for use by the voice
app. and waiting for input from it.

++ Remember, this is not voice patterns we are getting from the voice app.
but the results of it's pattern matching against it's (voice apps.) own
dictionaries. Dictionaries of which we will be *changing* via the VIC.

In the spirit of simplicity, the voice app. is only used to match a
voice pattern (sound) with a word (text). A word it sends to the VIC.
In other words: The Voice app only translates from sound to text, NOT
command execution.

The VIC, in it's script processing, first changes the voice app.
dictionary used (for search speed). Likewise, it can change it's (VIC)
dictionary to parrallel the voice dictionary change. The next action (VIC
script) is to get input. Ah, a cycle, get input, act on it and repeat.

A: We are building a command line that has specific elements. All we need
do is to fill in each required element and any optional elements. So long
as we get back to the format dictionaries we can fill in each of these
elements in the same way as we did "DF0" and with disregard to the final
or proper sequence of arguement elements in a command line. With this in
mind the format script may be as simple as "get input, act on it and
repeat".

There is one more thing needed here. We need a word to exit out of this
cycle ("go" or any word defined in the format dictionary to do so) and in
the process place the elements in proper order. Of course placing the
result in a variable the utilility script sends to a command line for
execution. Once sent, the simple utility script exits by shutting the VIC
down which cleans up before shutown.

So the vocal input might be:

"Computer utility format name f u n drive D F 0 go".

Ultimately it's a trade off between VIC processing time and total voice
dictionaries size, to accomplish overall increased voice processing speed.
With only a small voice dictionary, it might not be worth it. But if you
plan on expanding the voice dictionary(s) to a large size or perhaps able
to handle specific individuals, then it will be worth it. Of course those
likely to benefit most from such are the handi-cap, even those with
additional speech problems (voice word dictionaries customized for them).

I know I've left out some details (specifics of how the VIC does these
things) but I'm already pushing flames with the size of this post.

In the same way the VIC gets input from the voice app, this input could
come from other methods. Also such a VIC process could be created to
allow easy creation of parallel voice/vic dictionaries.

But do understand the VIC does not do the work of a voice translation
program, but can assist in it's use. No need to reinvent the wheel, just a
way to integrate the wheel with other things, including other application
control.

A few notes:

Dragon Systems is perhaps the best or at least most notiable voice command
software available today. But it's for the PC. I've seen commericials for,
I think, IBM e-business regarding voice to text conversion. And I know the
Mac has some pretty decent computer voice output.

I hope I haven't bored or otherwise confused anyone. Perhaps there will
even be an improvement in how serious the VIC is taken. Maybe even
inspire some to openly contribute to the creation of the VIC.

There are two more notes (They've been on my mind).

1) Thru the ICOAs resource pages there is mention of seven different
methods of external program control on other systems, but that AREXX (port)
on the AMIGA is the best. Not having the limitations or disadvantages
these other methods have.

I didn't know there where seven others, but know AREXX (port) is very
usable. With this in mind, perhaps now there is not so much competitive
concern regarding my stand of the VIC being freeware. Certainly as the VIC
gets out and people find it's usefulness, they will be drawn towards the
system that allows optimum use of the VIC. The Amiga. And yes, I've always
known this. :)  Additionally I'm well aware imitators will always crop up
but having it right to begin with, well it only leaves the imitators with
something less than right to do. :)

2) The archived resource of the newsgroups. What are we putting into this
archive? Alot of garbage and a little information that will be usable over
time? It's our choice, let's not waste this potentially valuable resource
to badly. Consider how each newsgroups is specific in it's overall topic.
Consider how all newsgroups might be mapped like the front of a thesauras
so topic and field specific information can more easily be found. For
example where might one find natural language processing functionality?
How about Cad specific vocabularies? etc.. The foundation size of this
archived resource is huge and it only follows in having a huge potential
far greater than what anyone or group could hope to build outside of this
already built foundation.

I just wish I had more time to focus my effort, but the VIC has gotta be
freeware, I gotta earn an income and right now it only leaves me one day a
week to do domestic chores and focus on the VIC and related things (i.e.
web page updating). And to think I don't currently work with computers.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*3 S.E.A.S - Virtual Interaction Configuration (VIC) - VISION OF VISIONS!*
   *~ ~ ~    Advancing the way we Perceive and Use the Tool of Computers!*
Timothy Rue                               What's DONE in anything we do?
Email @ tim...@mindspring.com             v<--------<----9----<--------<
Web @ http://www.mindspring.com/~timrue/  | *AI PK OI IP OP SF IQ ID KE* |
                                          >INPUT->(Processing)->OUTPUT>^
Search email/name @ http://www.dejanews.com for other puzzle parts/posts.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.