Talkitt: Exciting breakthrough in speech recognition/generation

Skip to first unread message

Dallas E Webster

Feb 9, 2015, 12:42:19 AM2/9/15
to Accessibility SIG
I have been tracking this emerging, breakthrough, technology from a distance, but just increased my vigilance. Few people, including potential benefactors, seem to be aware of this effort, or similar work. Your feedback will be appreciated.

Functionally, Talkitt seems to be Dragon Naturally Speaking for people with speech impairments. It claims to work with severe speech impediments, saying it will translate unintelligible pronunciation to understandable speech. I suspect it may best be described as Speech-to-Text-to-Speech, although they call it pattern recognition and distinguish it from speech recognition. It also works with any language.

Check out this quick demo and overview

These examples reflect congenital or early speech impairments. It is noteworthy that people who have degenerative speech issues (e.g., ALS or MS) but still speak clearly can use Talkitt out of the box, without training, much like people use SIri or DNS.  Talkitt will continue to learn as speech degrades. A byproduct of Talkitt in such scenarios is that the person's actual voice is captured -- preserved, so his later speech will sound like his original.

You probably now have enough to pique your interest.
You can get a little more at


Details on the process seem scarce (naturally well-guarded?), with almost nothing in writing. I did unearth marketing presentations and evangelism in several places. But every exposition with any details has had poor A/V quality (fuzzy V; overdriven, clipped A), so is difficult to understand. If I could only run audio through Talkitt, ...! [There are speech impediments, and impeded speech.]

The best I've come across is

Here's what I could tease out (with some speculation and inter/extra-polation -- Free to add and correct.) from these presentations.

After experimenting with using intonation to create speech or commands, such as for gaming, Voiceitt's growing staff realized they had a life-changing opportunity. It eventually developed patented "adaptive framing" technologies (which divide words or utterances into smaller "homogenic frames" that can be differentiated, to increase accuracy), then used existing fixed framing to produce the final translations. I suspect it will use sophisticated audio analysis and clever contextual word-prediction and disambiguation, a la that which Siri and keyboards like Swype, Fleksy and SwiftKey have impressed us with.

Not surprisingly, it may require considerable upfront data entry, training or calibration, possibly relying on a facilitator, a supportive friend who can already understand the speech and match it to appropriate words. However, it appears that people will be able to use it immediately with those words and phrases on which they have just trained. As Talkitt moves toward to production, Voiceitt is working to reduce the learning/training curve. I think (Remember -- bad A/V) they said they are going to be training with phonemes, phonetics or words and phrases that are customized to each user.

Talkitt will initially be accessed through a free Android app (with iOS and desktop software following soon), but it will rely on a subscription service at $19.99 per month -- the SIri cloud-based super-processing/analysis model

Talkitt was created by Voiceitt, which was founded in Israel, but grew up and lives in Boston, with its education/science community. It was incubated under MassChallenge. But I first heard about Talkitt, when it was nominated in the Healthcare category in the 2014 Verizon Powerful Answers program. But, I did not realize then that the technology is more mature and launch date is much sooner than I'd expected -- perhaps 1-3 months from now. It actually became one of the Verizon winners. It was also nominated for TEDMED's Hive last September, receiving the concomitant attention, and was selected by Philips North America as the 2014 winner of the Innovation Fellows Competition.

The pronunciation is "Talk it" and "Voice it", though I find myself saying "Talk-I.T.T" and "Voice-I.T.T" to make the extra "t" register. However, I have not seen any indication that the spellings are an intended play on ITT, the venerable communication giant. Maybe they just added a "t" because "Talkit" and "Voiceit" were taken, or because a it looks or feels like a "+".

Next up? -- Bypassing speech altogether and translating thoughts!

Dallas Webster
Assistive Technology Specialist
Upstream Technology
Making the Best of Your Abilities
(512)461-4696 (cell)
(512)795-9763 (voice/fax)
You received this message because you are subscribed to the Google Groups "techlunch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To post to this group, send email to
Visit this group at
For more options, visit

Reply all
Reply to author
0 new messages