100% noobie question, but I need someone to point me in the right direction,
so I appreciate any constructive answers that I can get.
I'm developing an application that will do Speech To Text, then Text To
Speech in multiple languages. I can do easily this with the installed API
recognizers for English, Chinese and Japanese.
I have Googled this problem 7 ways to Sunday, I've been on MSDN and every
other legitimate source, but I have not been able to figure out how to get
other languages supported. Languages such as French, Spanish, German,
Icelandic, pig-Latin, etc. My biggest road block seems to be a pure lack of
information available.
I got excited briefly and thought I was on the right track with MUI packs,
but I was wrong.
I'm fairly certain that us (US), and eastern Asia are not the only folks
interested in having a two way conversation with their computer. But I've
been wrong before.
It should also be known that at home I'm developing on XP Home SP3 with C#
and .Net 3.5. I'm working with System.Speech as well as the SAPI 5.1
directly.
I also have RDP access to a Server 2003 (which I understand has greater
language support) box with a full set of development and localization tools,
utilities, and libraries installed.
I need to expand my limit from 3 languages to cover most major dialects of
the world. I will be very grateful for any information that can help me get
there. It can't be all that difficult right?
I don't try to bribe very often, but I will profit from this information and
I need it badly, so someone else should profit too...
If you can help me get over this hump I'll make a $20 donation to the
charity of your choice on my next payday (15th or 16th of December) ...as
long as I can do it via a Paypal Donate button on a legitimate website.
Simple rules to avoid disputes:
1. First-come, first-served.
2. No reward for duplicate answers.
3. Multiple Unique helpful posts will be my decision to reward, but will
most likely be rewarded as long as the rules are followed.
4. If there are too many I may have to put some off until the 1st or 2nd of
January (my next payday), but I will honor all that I determine to be
worthy.
5. This offer stands for 48 hours beginning 12:00 AM December 11, 2009.
6. A "Charity" is defined as a non-profit organization that works for a
cause to improve a situation.
7. I will research each entry to determine legitimacy, and the final
decision is my own.
I'm a man of my word. Please help.
--
Roger Frost
Logic Is Syntax Independent
Regarding languages and dialects, companies like Loquendo offer TTS engines
intended for use with specific languages and/or dialects (which is VERY
important - we use them ourselves for this purposes, catalan for example.)
Hans
Let me just catch you up to where I am currently at...I've been meaning to
do this anyway for future reference and to help others.
I scraped the Speech SDK plan and decided to take the UCMA approach. This
solved my problem of too few available recognizers (I need to stay away from
3rd party solutions as much as possible), and another benefit is the managed
code.
This also pretty much makes this an off topic post, but I don't want to
loose the thread since you replied here, and I think my root problem will
apply to the desktop API as well as the server variants.
UCMA is supported on Server 2003 and 2008 so compatibility isn't an issue
for me.
Also, the transition was almost painless as the namespace (and class
definitions) for UCMA is structured just like System.Speech, so all I had to
do was add the reference to Microsoft.Speech, remove ref's to System.Speech,
then update using statements to Microsoft.Speech from System.Speech. More
on this in a moment.
So now I have 12 (I think it is) unique recognizers and that is an excellent
start to what I need in the end. Again, I'll consider this problem solved
for now as most of the world's languages is now covered in my application.
I might also add that Text-To-Speech seems to be working great with my
current.
Okay, back to the transition from System.Speech to Microsoft.Speech (UCMA).
Everything went fantastic, except Microsoft.Speech.Recognition doesn't have
the DictationGrammar class that is available in System.Speech.Recognition.
When I was using the Speech SDK I was relying on DictationGrammar, and for
the 3 languages available with that API, it worked excellent for "free
dictation".
I'm currently trying to use GrammarBuilder to build a dictation Grammar
object that I can substitue for my old DictationGrammar method. I know this
is possible...in fact I think DictationGrammar is just a shorthand way of
getting there in System.Speech.
The GrammarBuilder.Append* methods and their usages just looks like Greek to
me. I keep getting the Exception:
"Cannot find grammar referenced by this grammar." Even when using the *few*
examples on MSDN, it's a no go, same exception.
From what I can tell, building a dictation grammar object should be the
easiest of all because there are no phrases to deal with, basically you just
tell the engine to transcribe everything you possibly can.
Just to vent a little bit, I'm very disappointed that someone (namely
Microsoft) took the time to refine a great technology, build an API for it,
then distribute it freely for developers to build "nicer" Windows
apps....but forgot to document it. (And for everyone about to argue with me
over semantics, if you can find me a solid example of a grammar for free
dictation on the internet that works as expected with UCMA then I'll eat my
shoe, foot and all...)
But seriously folks, I appreciate any guidance in the right direction on
this problem...again, it can't be that hard right? I'll post some code that
doesn't work (throws above exception), this is a Microsoft example, and I
appologize for the line breaks....
// start code
this.engine = new
SpeechRecognitionEngine(Application.CurrentCulture);
GrammarBuilder dictaphoneGB = new GrammarBuilder();
GrammarBuilder dictation = new GrammarBuilder();
dictation.AppendDictation();
dictaphoneGB.Append(new SemanticResultKey("StartDictation", new
SemanticResultValue("Start Dictation", true)));
dictaphoneGB.Append(new SemanticResultKey("DictationInput",
dictation));
dictaphoneGB.Append(new SemanticResultKey("EndDictation", new
SemanticResultValue("Stop Dictation", false)));
GrammarBuilder spelling = new GrammarBuilder();
spelling.AppendDictation("spelling");
GrammarBuilder spellingGB = new GrammarBuilder();
spellingGB.Append(new SemanticResultKey("StartSpelling", new
SemanticResultValue("Start Spelling", true)));
spellingGB.Append(new SemanticResultKey("spellingInput",
spelling));
spellingGB.Append(new SemanticResultKey("StopSpelling", new
SemanticResultValue("Stop Spelling", true)));
GrammarBuilder both = GrammarBuilder.Add((GrammarBuilder)new
SemanticResultKey("Dictation", dictaphoneGB),
(GrammarBuilder)new
SemanticResultKey("Spelling", spellingGB));
Grammar grammar = new Grammar(both);
grammar.Enabled = true;
grammar.Name = "Dictaphone and Spelling ";
this.engine.LoadGrammar(grammar); // Throws
FileNotFoundException: Cannot find grammar referenced by this grammar.
// end code
By the way, there shouldn't be any file io happening here, at least none
that is not implied, since the example notes didn't mention it.
The above example can be found here:
http://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.grammarbuilder.debugshowphrases(office.13).aspx
here is another one that is broken in the same line of code (very similar to
previous one):
http://msdn.microsoft.com/en-us/library/ms576566.aspx
Thanks,
Roger
What did u installed to get more recognizers if i might ask?
I am developing on Windows Server 2008 right now..
I switched to UCMA 2.0 because SAPI didn't have the multi-lingual support I
was looking for (and if it did, I couldn't find it).
For more recognizers to use with UCMA you will need to download Microsoft
Office Communications Server 2007 R2 UCMA 2.0 Speech Language Packs.
UCMA is supported on Server 2003 and 2008.
Regards,
Roger
"cornelyus" <corn...@discussions.microsoft.com> wrote in message
news:284C5923-431E-4ABE...@microsoft.com...
One thing i don't understand.. UCMA is for server side applications, and
SAPI 5.3 ( System.Speech) is for client side apps? I still can't figure out
all the different SDKs and such for developing speech recognition..
Since my application is going to run on Server 2003 then I'm okay with using
UCMA even though the app itself really isn't a server program per say. If
that makes any sense.
I would have loved to use SAPI but recognizer support for languages seems to
be lacking, with just Chinese, Japanese and English available. This is
enough to prove a concept, but not as many as I need. There may be 3rd
party recognizers available for other languages, but I need to stay away
from them for this project.
I'm still trying to get a dictation grammar object that works. The funny
thing is, this should be the easiest grammar object to create, and in my
opinion it should be the default. By default, I mean when you initialize a
new SpeechRecognitionEngine object with no other Grammar objects added, then
dictation should be implied. There are no phrases, or choices or semantic
results, it's simply "transcribe everything you can".
So tell me, what are you trying to do with the speech api's? I haven't
been successful on my quest as of yet, but I have learned a few things.
-Roger
"cornelyus" <corn...@discussions.microsoft.com> wrote in message
news:838E4877-4B1D-4BD7...@microsoft.com...
I am developing on Win Server 2008, unfortunately this wasn't such a good
idea because i don't have the english recognizers that come by default on
Windows Vista and Windows 7 ( can't quite understand this either ).
Talking about lack of documentation, have you heard about SAPI 5.4? Yes it
exists and i still don't know the differences to 5.3
http://msdn.microsoft.com/en-us/library/ee125663(VS.85).aspx
"Roger Frost" wrote:
> .
>
You can get the recognizers for the Speech SDK from the download page...
SpeechSDK51LangPack.exe gives you English, Chinese and Japanese recognizers,
but I don't know if it will install on Server 2008, Vista, or 7. These
aren't listed in the System Requirements, but neither is Server 2003, where
I have successfully installed and used it. Also, this system had the SDK
installed before hand, which may be a prerequisite, in which case you will
probably need to use either System.Speech or SpeechLib to access these
recognizers.
I do know however that if you install Unified Communications Managed API
(UCMA) v 2.0 on your Server 2008 system, there are a plethora of language
packs available, including English. And since you're using command grammar,
you won't face the same problems I have. I think UCMA is the direction that
Microsoft is taking with Speech technology, so jumping on board early won't
hurt.
Interestingly enough, I've seen documentation on the SAPI for versions > 5.1
as you posted, however I've only found the download page for SDK version 5.1
thus far. Is there a Speech SDK 5.4 to go along with that SAPI 5.4 I
wonder?
-Roger
--------------------------------------------------
From: "cornelyus" <corn...@discussions.microsoft.com>
Sent: Monday, December 21, 2009 4:13 AM
Newsgroups: microsoft.public.speech.desktop
Subject: Re: SAPI Multi-lingual capabilities (for charity)
When u install that Speech SDK 5.1, what version of the recognizers u got?
Because from what i saw there are the 5.1 versions, and on windows vista the
default recognizers are version 8.0 that seem a lot better than 5.1.
I tried installing what u said, the UCMA, but i don't have access to more
recognizers that i already had, and it jammed the only ones i had, my program
stopped working because couldn't identify recognizers.
We need more documentation!!!
As for SAPI 5.4., i just found that link, and no download possibility.
Ricardo
"Roger Frost" wrote:
> .
>
Bottom line,
what did u installed, and what recognizers you have when u open Control
Panel and Speech tab?
Thank you
Ricardo
"Roger Frost" wrote:
> .
>
The MSI that the recognizers were packaged in for the Speech SDK was version
5.1.
Sorry for the problems you had with that.
The UCMA voices are also much smoother than the WindowsXP, or SAPI ones.
In reply to your 2nd post, I downloaded the UCMA Language Packs from here:
There are 12 available (counting French Ca and French France, English GB and
English US, etc)...
I second that, we do need documentation, much more than we need new SDK's
and API's at this point!
-Roger
"cornelyus" <corn...@discussions.microsoft.com> wrote in message
news:ECB51D5B-0641-4BED...@microsoft.com...
can i use the e-mail you have on your profile to contact you?
Regards,
Ricardo
"Roger Frost" wrote:
> .
>
"cornelyus" <corn...@discussions.microsoft.com> wrote in message
news:0C38425D-A41A-4117...@microsoft.com...
Update:
Installed UCMA SDK again, and the language packs for english uk and us..
noticed that in Control Panel the recognizers i had stayed the same. Then
changed on my code all the System.Speech to Microsoft.Speech ( Added
reference to this also) and now on my App i list 2 recognizers for english,
and none of the ones i had. Understand from this that using System.Speech and
Microsoft.Speech uses different recognizers, even though they are all
installed on the machine..
Was using grammars on xml, now the grammars are all in grxml. They seem the
same but aren't exactly. Even the grxml that came with the installation of
UCMA wouldn't work before i made some changes.
All in all.. for me it's not worth the effort of changing now to UCMA and
their recognizers + grammars. I will continue my develop on server 2008 and
use a machine with windows Vista / 7 to make my tests.
Thanks for all the info Roger.
"Roger Frost" wrote:
> .
>
At least it's easy to switch back and forth for comparisons, like you said,
just change the reference Microsoft.Speech <> System.Speech, and update
using statements.
As far as the grammar, I think the files that install with UCMA are for the
demo's, and I didn't have any luck with them either.
If you get bored, see if you can get dictation working in UCMA (or SAPI
WITHOUT using the DictationGrammar class). I'm offering a reward, here I'll
even give you some code to get you started (pardon the line breaks):
public partial class SpeechToTextForm : Form
{
SpeechRecognitionEngine engine;
public SpeechToTextForm()
{
InitializeComponent();
this.engine = new SpeechRecognitionEngine();
this.engine.SpeechDetected += new
EventHandler<SpeechDetectedEventArgs>(engine_SpeechDetected);
this.engine.SpeechHypothesized += new
EventHandler<SpeechHypothesizedEventArgs>(engine_SpeechHypothesized);
this.engine.SpeechRecognitionRejected += new
EventHandler<SpeechRecognitionRejectedEventArgs>(engine_SpeechRecognitionRejected);
this.engine.SpeechRecognized += new
EventHandler<SpeechRecognizedEventArgs>(engine_SpeechRecognized);
GrammarBuilder dictaphoneGB = new GrammarBuilder();
GrammarBuilder dictation = new GrammarBuilder();
dictation.AppendDictation();
dictaphoneGB.Append(new SemanticResultKey("StartDictation", new
SemanticResultValue("Start Dictation", true)));
dictaphoneGB.Append(new SemanticResultKey("dictationInput",
dictation));
dictaphoneGB.Append(new SemanticResultKey("EndDictation", new
SemanticResultValue("Stop Dictation", false)));
GrammarBuilder spellingGB = new GrammarBuilder();
GrammarBuilder spelling = new GrammarBuilder();
spelling.AppendDictation("spelling");
spellingGB.Append(new SemanticResultKey("StartSpelling", new
SemanticResultValue("Start Spelling", true)));
spellingGB.Append(new SemanticResultKey("spellingInput",
spelling));
spellingGB.Append(new SemanticResultKey("StopSpelling", new
SemanticResultValue("Stop Spelling", true)));
GrammarBuilder both = GrammarBuilder.Add((GrammarBuilder)new
SemanticResultKey("Dictation", dictaphoneGB),
(GrammarBuilder)new
SemanticResultKey("Spelling", spellingGB));
Grammar grammar = new Grammar(both);
grammar.Enabled = true;
grammar.Name = "Dictaphone and Spelling ";
this.engine.LoadGrammar(grammar); // <== You should get an
exception here if your luck is like mine.
}
void engine_SpeechRecognized(object sender,
SpeechRecognizedEventArgs e)
{
}
void engine_SpeechRecognitionRejected(object sender,
SpeechRecognitionRejectedEventArgs e)
{
}
void engine_SpeechHypothesized(object sender,
SpeechHypothesizedEventArgs e)
{
}
void engine_SpeechDetected(object sender, SpeechDetectedEventArgs e)
{
}
}
--
Roger Frost
"It's all about finding the correct solution. The solution exists,
this is just software :) There are no non-solutions." -Anthony Nystrom
"cornelyus" <corn...@discussions.microsoft.com> wrote in message
news:2A15E1C4-2E46-4622...@microsoft.com...
Thanks for all replys..
Sorry but i'm really swamped on work now.. english recognizer is getting me
a lot of wrong recognitions and i don't know what to do to get this better..
Check out Control Panel > Speech > Tab: Speech Recognition > Group:
Recognition Profiles
There is a Train Profile button in there, but this is as far as my knowledge
goes since I haven't gotten this far yet with my own project.
--
Roger Frost
"It's all about finding the correct solution. The solution exists,
this is just software :) There are no non-solutions." -Anthony Nystrom
"cornelyus" <corn...@discussions.microsoft.com> wrote in message
news:08A7AEF1-CBA8-4B73...@microsoft.com...