Japanese is easier to recognize because it has fewer phonemes, by the
way. Only the five basic vowels and fewer consonants than English, and
*far* fewer blended phonemes. They have an awful lot of homophones,
though, so I'm sure you'd need some pretty powerful statistics to
disambiguate.
Keep in mind that 90% accuracy means one word in every ten is an error
-- that's one to two errors per sentence. 95% accuracy is one wrong
word in every 15. 99% accuracy is one error in every 100 words, or
about one per paragraph. As a CART provider, I aim for a standard of
99.9% accuracy, which is about one error per every 1,000 words, or
every four double spaced pages.
Other notes on voice writing:
The supposedly short training period is voice writing's major selling
point over steno (aside from the cost of equipment), but from what I
can tell, it's not actually true. You can train someone to a moderate
degree of accuracy very quickly; all they have to do is speak into the
microphone slowly and clearly, and it'll get a fair amount of words
correct. For dictation or offline transcription, this can work well,
assuming they have the stamina to speak consistently for long periods
of time, because they can stop, go back, and correct errors as they
make them. But actual live realtime respeaking at CART levels of
accuracy (ideally over 99% correct) is much harder.
* Short words are more difficult for the speech engine to recognize
than multisyllabic words are, and are more likely to be ignored or
mistranscribed.
* If the voice captioner does mostly direct-echo respeaking, meaning
that they don't pronounce words in nonstandard ways, they have to
repeat multisyllabic words using the same number of syllables as in
the original audio; if they try to "brief" long words by assigning a
voice macro that lets them say the word in one syllable, they run up
against the software's difficulty in dealing with monosyllabic words
that I mentioned above.
* Because they're mostly saying words in the same amount of time as
they were originally spoken (unlike in steno, where a multisyllabic
word can be represented by a single split-second stroke), they don't
have much "reserve speed" to make corrections if the audio is
mistranscribed. They also have to verbally insert punctuation and use
macros to differentiate between homonyms, which also takes time and
can be fatiguing.
* Compensating for the lack of reserve speed by speaking the words
more quickly than they were originally spoken can also be problematic,
because the software is better able to transcribe words spoken with
clearly delineated spaces between them, as opposed to words that are
all run together.
* This means that if the software makes a mistake and the audio is
fairly rapid, the voice captioner is forced to choose between taking
time to delete the mistake and then catching up by paraphrasing the
speaker, or to keep up with the speaker while letting the mistake
stand.
* Also, the skill of echoing previously spoken words aloud while
listening to a steady stream of incoming words can be quite tricky,
especially when the audio quality is less than perfect; unlike
simultaneous writing and listening, simultaneous speaking and
listening can cause cross-channel interference.
So yeah.
Low or moderate accuracy offline voice writing = short training
period, most people can do it.
Low or moderate accuracy realtime voice writing = somewhat longer
training period, machine-compatible voice timbre and accent required.
CART-level accuracy realtime voice writing = extremely long training
period, an enormous amount of talent and dedication required.
This is why steno hasn't been supplanted yet, and isn't likely to be,
as long as CART clients refuse to accept inaccurate realtime.
More here: http://stenoknight.com/VoiceVersusCART.html
And here: http://plover.stenoknight.com/2010/06/cart-court-and-captioning.html
So yeah.
Low or moderate accuracy offline voice writing = short training
period, most people can do it.
Low or moderate accuracy realtime voice writing = somewhat longer
training period, machine-compatible voice timbre and accent required.
CART-level accuracy realtime voice writing = extremely long training
period, an enormous amount of talent and dedication required.
This is why steno hasn't been supplanted yet, and isn't likely to be,
as long as CART clients refuse to accept inaccurate realtime.
More here: http://stenoknight.com/VoiceVersusCART.html
And here: http://plover.stenoknight.com/2010/06/cart-court-and-captioning.html