1. receive raw audio from a network/internet stream (done)
2. Test the packets to be sure they have viable audio content (done)
3. write the audio bytes to the AudioInputStream property of the Recognizer
object (?????)
4. Raise events when the spoken word as are recognized (done)
The test above works perfectly when the incoming audio is first written as a
RIFF/wav file and then fed to the speech engine. THIS IS CLEARLY A BAD WAY TO
INPUT THE SPEECH:-)
Can anyone help me?
--
Jon L. Arking, SCJP, MCSD
Jon,
you may want to (cross-)post your SAPI related questions to:
microsoft.public.sapi5.beta
--
Best Regards
Andreas Marschall
Microsoft MVP for TAPI / Windows SDK
TAPI / TSP Developer and Tester
http://www.I-B-A-M.de/Andreas_Marschall's_TAPI_and_TSPI_FAQ.htm
* Please post all messages and replies to the newsgroup so all may
* benefit from the discussion. Private mail is usually not replied to.
* This posting is provided "AS IS" with no warranties, and confers no rights.
Jon,
this was the only SAPI newsgroup I found on msnews.microsoft.com newsserver.
There a some others regarding speech:
microsoft.public.netspeechsdk
microsoft.public.speech_tech
microsoft.public.speech_tech.sdk
The best place to post questions about SAPI 5.1 is to
microsoft.public.speech_tech.sdk. Various Microsoft employees {and other
helpful people} monitor these newsgroups but unfortunately not every message
gets responded to.
To answer your question a little bit: if you're not using TAPI; and you want
the data to be streamed in real-time {i.e. not waiting for all the audio to
be streamed and then sending to the engine, then you need to implement your
own SAPI audio object. This is a COM object that implements the ISpAudio
interface, which allows SAPI to call the Read() method to read data from the
engine. The ISpAudio interface is documented in the SAPI 5.1 help. However,
writing an object like this is not necessarily simple, and requires a
reasonable understanding of COM and threading. There a sample in the SDK
called Simple Audio that is an example audio object of this type, and a
couple of help pages in the SDK help may be also be useful: "Using Sample
Audio Object" {in the "Whitepapers" section}, and "AudioApp for VisualBasic"
{ in the "SDK Samples {Automation}" }section.
If it's possible to wait for the audio to finish streaming before starting
recognition, things are simpler. You can use the SpMemoryStream,
SpCustomStream or SpStream classes to feed the audio data in a single chunk
to the recognizer.
I hope this helps, let me know if you have further questions,
Dave Wood
Speech Components Group
Microsoft
--
This posting is provided "AS IS" with no warranties, and confers no rights.
"Andreas Marschall [MVP TAPI]" <Andreas....@I-B-A-M.de> wrote in
message news:uMb9zq9w...@TK2MSFTNGP09.phx.gbl...
First of all, thank you so very VERY much for taking the time to answer my
question. I think I understand where you're coming from. If I want a direct,
contiguous feed into the engine then I'll need to build the COM object for
Speech automation. That way I simply feed in the bytes and SAPI figures out
when and how to read them. Otherwise, I can wait for chuncks of data to come
in and then assign them into the custom stream objects. Is that correct?
I am willing to try either option (I have some experience with C++ COM and
threading), however I have so far had no luck with the custom objects. I have
tried the SpMemoryStream, SpCustomStream, and SpStream objects and I can't
seem to get SAPI to budge! Can you give me some quick sample code showing how
to use them with raw PCM? I can set up the testbed to receive a single spoken
statement in raw PCM, write it out to a wav/RIFF file and feed it into a SAPI
Filestream object, and everything works. I just need to eliminate the
file-based middle man and write the PCM directly to the AudioInputStream
field of the Recognizer. IT JUST DOESN'T EVER WORK! Can you help me? If the
wav file works, what am I doing wrong?!!
I am having similar problem with the one you brought up on this thread. So
have you got any solution?
could you please detail your actual problem - maybe others could
help, too.
Best regards,
Matthias Moetje
-------------------------------------
TERASENS GmbH
Ackermannstraße 3
80797 München
-------------------------------------
Fon: +49 89 143370-0
Fax: +49 89 143370-22
e-mail: moetje at terasens dot de
www: www.terasens.de
-------------------------------------
"asiboro" <asi...@discussions.microsoft.com> wrote in message
news:AA701FA1-82F1-41CE...@microsoft.com...
First, I've done the following:
1. Record audio input from application (PCM mono 16 bit)
2. Bind to wav file and save it
3. Feed it to SAPI (succeeded in recognizing)
Now, I'm trying to replace the step no. 2, i.e: without the intermediate file.
Now I have the audio data in bytes. So, to "whom" should I pass this
sequential of bytes?
I've tried to use SpStream, SpCustomStream, SpMemoryStream, SpStreamFormat,
but failed when I called Write(...) to write the audio bytes.
According to the documentation, I need to implement ISpAudio, and finally I
found the sample: ISpAudioPlug.
Can I use it?
I'm on the way attempting to use ISpAudioPlug, but failed when constructing
the object.
#import "C:\Lib\simpleaudio.dll" using namespace SIMPLEAUDIOLib;
...
CComPtr<ISpAudioPlug> strm;
strm.CoCreateInstance( ??? ); // I found no CLSID on *.tlh generated by
compiler, hence I'm stuck in here.
If this is not the correct way, any clues would be appreciated.
I didn't quite understand the description of options in the first
post, but we have the direct streaming for TTS and ASR working
with our application. (Although we chose, to use it only for ASR).
Did you have a look at the "TapiCustomStream" sample from the
Speech SDK (5.1). It think it will help you do what you want.
Best regards,
Matthias Moetje
-------------------------------------
TERASENS GmbH
Ackermannstraße 3
80797 München
-------------------------------------
Fon: +49 89 143370-0
Fax: +49 89 143370-22
e-mail: moetje at terasens dot de
www: www.terasens.de
-------------------------------------
"Joni Zhang" <Joni...@discussions.microsoft.com> wrote in message
news:E9E30D38-6F1F-4854...@microsoft.com...
I don't using TAPI. Is there solution without using TAPI?
because I already have raw audio data bytes (PCM mono 16 bit).
Supposed that I captured audio bytes from microphone.
Now, I need to pass this bytes to SAPI.
So, I would need to write this bytes to buffer (ISpAudioPlug??)
From the SDK sample (Vb AudioApp), I got this:
Voice from mic (SpVoice) -> AudioPlugOut -> AudioPlugIn -> Recognizer
In my case, I think it should be like this (?):
Raw audio data bytes (unsigned char/BYTE) -> AudioPlugOut -> AudioPlugIn ->
Recognizer
I have no idea how to pass it to ISpAudioPlug.
The problem is really only how to pass this bytes to SAPI.
if you are not using TAPI, then this is OT here in the TAPI newsgroup.
Still you might get some hints from the sample I mentioned.
Best regards,
Matthias Moetje
-------------------------------------
TERASENS GmbH
Ackermannstraße 3
80797 München
-------------------------------------
Fon: +49 89 143370-0
Fax: +49 89 143370-22
e-mail: moetje at terasens dot de
www: www.terasens.de
-------------------------------------
"Joni Zhang" <Joni...@discussions.microsoft.com> wrote in message
news:5B9A776A-BADA-4F07...@microsoft.com...