Quick T2si Lite License 180

0 views

Skip to first unread message

Lutero Chaloux

unread,

Jun 28, 2024, 7:00:06 PM6/28/24

to madonpostfreed

Quick T2SI Lite software allows the development of Speaker Independent vocabularies in a very easy Text-to-Speech fashion. This enables very quick and efficient development of Speaker Independent voice recognition applications.

Quick T2si Lite License 180

Download File ✸✸✸ https://www.google.com/url?hl=en&q=https://tlniurl.com/2yLNaY&source=gmail&ust=1719702004382000&usg=AOvVaw0CD7MfyfZrKmImr4zxdSiq

The Quick T2SI Lite software allows the development of Speaker Independent vocabularies in a very easy Text-to-Speech fashion. This Windows application enables very quick and efficient development of Speaker Independent voice recognition applications.

The Quick T2SI Lite incorporates the latest advances in neural networks combined with Hidden Markov Modeling to create a powerful phonemic recognizer, using text entry to create, edit, build and download embedded vocabularies to the EasyVR.

If your board does not have the IOREF pin but it is running at 3.3V, you can still operate the EasyVR Shield 3 correctly if you manually connect pins IOREF and 3V3 together, for example with a jumper wire.

The EasyVR module by Veear and available from several other distributers is a small low-cost voice recognition module. Pricing is about the same as mbed. A basic speech recognition demo was working after about an hour of work after opening the box. The black potted IC in the middle is likely the processor chip and the large chip is flash. Most likely, it is one of the ICs from Sensory that was used in the recent reincarnation of Furby and quite a few other embedded devices and toys.

It outputs a serial TTL signal and runs off of 3.3V. Just plug in the microphone, hook up power, and then the serial RX/TX pins. Don't forget the RX and TX swap when connecting to mbed (i.e., RX-TX and TX-RX) and be very careful not to swap the color coded power pins!

The serial bridge code below can then be run on mbed so that it can talk to their PC-based EasyVR GUI training program over mbed's USB Virtual Com Port. This software allows the user to create and test new speaker dependent (i.e., trained for one person) command words.

It comes with some built-in speaker independent voice recognition commands (available in English, Italian, Japanese, German, Spanish and French). Here is a demo based on the number commands. This video is using the set of number words (0..10) to control (toggle) the 4 leds on mbed. The demo needs some more work to add timeout and error code checking as suggested in their manual, but it works fairly well without it. Commands and responses are all sent as printable ASCII characters.

For speech synthesis, the EasyVR can play compressed audio files of human speech. The EasyVR can also output to a 8ohm speaker (J2 jack in upper right corner of board) for feedback and speech synthesis, but that feature was not used in the first demo. Users can make their own custom sound tables from *.wav files using Sensory's Quick Synthesis 5 tool included with the EasyVR software. I had issues running it on Win 7 64-bit and it could not seem to compress and save the sound files, but it worked OK on a different PC with a 32-bit OS. According to a recent EasyVR forum post a new version should be available soon that should fix this issue. There is also a fix for 64-bit Windows posted in the forum that helps with some of the sound table build issues. Audio files must be in *.wav format at 22050 Hz with 1 channel and 16-bits. Audacity, a free open source digital audio edit tool, can be used to convert most audio files to this format so that they can be used in the Quick Synthesis tool. The EasyVR GUI includes the commands to process and download the custom sound tables produced by Quick Synthesis to the EasyVR module. Whenever building a new sound table, build it, save it, and rebuild it. This is required to update all of the time stamps in the project so that the EasyVR GUI tool will allow downloading the new sound table.

The tool to download new sound tables in the EasyVR GUI operates at 115200 baud, so to download a new sound table to the module's flash, a serial bridge program is needed setup for 115200 baud instead of the 9600 baud rate used earlier for speech recognition commands. A pull-up resistor must be attached to the /XM pin to force it >3V (100ohm for 3.3v supply or 680ohm for a 5V supply) and power must be cycled after the pull-up is in place. Here is the bridge code to download new sound tables:

In the download dialog box, also check the "slow transfers" (115200 baud) box before hitting the final download button. After downloading the new sound table to flash, remove the jumper, cycle power, reload the 9600 baud bridge program, connect and click on the last sound table group in the left column. It should expand to show the new sounds just downloaded. You can select a new sound and click the speaker icon to play it on the speaker attached to the EasyVR module. I seemed to get a bit more volume on the speaker using a 5V supply for the EasyVR.This process is documented in the newest version of the EasyVR documentation from Veear. There is also a programming and firmware update cable that might make the process easier that should be introduced soon.

Using the EasyVR GUI download tool to program new sound files to flash at 115200 baud

The new sound table should appear back in the EasyVR GUI at 9600 baud

Once the sound table is in flash on the EasyVR module, it can be played back on the speaker with a play command using the index into the sound table as shown in the GUI image above. A small delay is needed between characters in complex multicharacter commands to ensure that a character is not occasionally dropped in the EasyVR UART. This delay is provided by using wait(.001). The EasyVR responds with a "o" after the sound is played back. A C function for playback is shown below. Num is the index into the sound table.

For the second demo which took a bit more work, several appropriate computer voice response *.wav files were obtained on the web. Using Audacity, the *.wav files were converted to the correct sample rate for use in the Quick Synthesis tool. In Quick Synthesis, the audio files were compressed to a low data rate. The default compression technique was used and there are also quite a few others to select from with different size and quality trade offs. Then using the EasyVR GUI tool download option, the new sound table with the compressed audio files was programmed into the EasyVR flash memory.

For a more advanced demo, code was written to use speech synthesis output for vocal user prompts, SI (speaker independent) recognition for the LEDs, and a new SD (speaker dependent) word, mbed, for use as a password. In the EasyVR GUI, the train option was used to add the new SD word, mbed.

To run the demo, you will also need to download the new sound table project to flash, and add and then train the password (mbed) in Group 1 using the EasyVR GUI. A zip file of the sound table project is available here

Keep in mind that noise, distance from the microphone, and variations in the way words are spoken will all impact the accuracy of any speech recognition system. There is even a variation in the way an individual speaker says the same word from day to day.

Users can develop speaker dependent (i.e., trained for one speaker based on samples) recognition words with the EasyVR GUI tool that comes with the EasyVR module.For users that want to develop their own custom speaker independent (i.e., works for any speaker) recognition words, additional software is needed from Sensory (Quick T2SI) that does not come with the module. The larger and more expensive VoiceGP DK-T2SI board comes with this additional software.

There are some open source text-to-speech synthesis tools such as Espeak that produce computer generated speech, and the speech output can be saved as *.wav files, but they require a fairly large amount of memory and some file space. They could be used to generate a computer sounding voice for the EasyVR module offline by saving the *.wav files, if you did not want to use human speech. Recorded human speech is typically easier to understand. There are also several open source speech recognition programs available for embedded devices such as PocketSphinx.

The password group in the EasyVR tool uses SV (speaker verification) and requires a more precise match. It must be trained under similar conditions (environment noise and distance from microphone). Speaker verification technology uses word-spotting techniques to dramatically enhance password biometric accuracy in noisy environments

Another interesting project would be to use the EasyVR for voice control of a robot such as the Roomba or iCreate. It has a built-in speaker independent vocabulary for robot movement, and this is the one of the primary target markets for the device.

In case you missed them, this video of a 2005 Furby II shows the toy's built-in speech recognition and synthesis capabilites. The software from Sensory can also keep track of when the mouth should move (called lipsync in the tool).

I have a furby and would like to play with the facial movements. Unfortunately I have found that everything runs off one reversible motor, so all the movements work in one rotation. I am not sure exactly were to start with it and that video is exactly what I'm looking for.

I saw a breakdown for the first ones on the web - I think it was called a "Furby autopsy" and another one was "hack Furby." We took one apart recently in the lab and you are right - it is a bit of a mechanical nightmare for the facial movements.

See -0305-C.pdf for a quick overview of the terminology and software - Speaker dependent means one person. Speaker independent means anyone. It comes with the speaker dependent software so that you can train it for one person at a time with your own new words (not everyone at once) and about 30-40 built-in speaker independent words in several languages. There is a big variance between different people, and speaker independent is a harder problem.