SAPI 5.1 custom streams

Jonarking

unread,

Nov 4, 2004, 11:14:01 PM11/4/04

to

Can anyone show me how to use a CustomStream object for inputting audio
directly to the SAPI 5.1 speech engine? The help docs all say it can be done,
but the only examples they give are for wav files! I have tried using the
MemoryStream class, the CustomStream class, the SpMMAudioIn class, and the
base Stream class, but I get different errors each time. Here's what I'm
trying to do:

1. receive raw audio from a network/internet stream (done)
2. Test the packets to be sure they have viable audio content (done)
3. write the audio bytes to the AudioInputStream property of the Recognizer
object (?????)

4. Raise events when the spoken word as are recognized (done)

The test above works perfectly when the incoming audio is first written as a
RIFF/wav file and then fed to the speech engine. THIS IS CLEARLY A BAD WAY TO
INPUT THE SPEECH:-)

Can anyone help me?
--
Jon L. Arking, SCJP, MCSD

Andreas Marschall [MVP TAPI]

unread,

Nov 5, 2004, 2:16:53 AM11/5/04

to

"Jonarking" <Jona...@discussions.microsoft.com> schrieb im Newsbeitrag
news:04C35A7D-8D7B-4CA9...@microsoft.com...

> Can anyone show me how to use a CustomStream object for inputting audio
> directly to the SAPI 5.1 speech engine?

Jon,
you may want to (cross-)post your SAPI related questions to:
microsoft.public.sapi5.beta

--
Best Regards
Andreas Marschall
Microsoft MVP for TAPI / Windows SDK
TAPI / TSP Developer and Tester
http://www.I-B-A-M.de/Andreas_Marschall's_TAPI_and_TSPI_FAQ.htm
* Please post all messages and replies to the newsgroup so all may
* benefit from the discussion. Private mail is usually not replied to.
* This posting is provided "AS IS" with no warranties, and confers no rights.

Jonarking

unread,

Nov 5, 2004, 12:04:03 PM11/5/04

to

I have tried posting my question on a number of SAPI-related usenet groups.
None of them ever get answered. In fact, the group you suggested has similar
stream questions to my own...none of them have been answered either. Is there
a more direct managed newsgroup I could use?

Andreas Marschall [MVP TAPI]

unread,

Nov 6, 2004, 3:27:37 AM11/6/04

to

"Jonarking" <Jona...@discussions.microsoft.com> schrieb im Newsbeitrag

news:885B496C-462F-483E...@microsoft.com...

> I have tried posting my question on a number of SAPI-related usenet groups.
> None of them ever get answered. In fact, the group you suggested has similar
> stream questions to my own...none of them have been answered either. Is
there
> a more direct managed newsgroup I could use?

Jon,
this was the only SAPI newsgroup I found on msnews.microsoft.com newsserver.
There a some others regarding speech:
microsoft.public.netspeechsdk
microsoft.public.speech_tech
microsoft.public.speech_tech.sdk

Dave Wood [MS]

unread,

Nov 10, 2004, 3:02:53 PM11/10/04

to

Hi Jon,

The best place to post questions about SAPI 5.1 is to
microsoft.public.speech_tech.sdk. Various Microsoft employees {and other
helpful people} monitor these newsgroups but unfortunately not every message
gets responded to.

To answer your question a little bit: if you're not using TAPI; and you want
the data to be streamed in real-time {i.e. not waiting for all the audio to
be streamed and then sending to the engine, then you need to implement your
own SAPI audio object. This is a COM object that implements the ISpAudio
interface, which allows SAPI to call the Read() method to read data from the
engine. The ISpAudio interface is documented in the SAPI 5.1 help. However,
writing an object like this is not necessarily simple, and requires a
reasonable understanding of COM and threading. There a sample in the SDK
called Simple Audio that is an example audio object of this type, and a
couple of help pages in the SDK help may be also be useful: "Using Sample
Audio Object" {in the "Whitepapers" section}, and "AudioApp for VisualBasic"
{ in the "SDK Samples {Automation}" }section.

If it's possible to wait for the audio to finish streaming before starting
recognition, things are simpler. You can use the SpMemoryStream,
SpCustomStream or SpStream classes to feed the audio data in a single chunk
to the recognizer.

I hope this helps, let me know if you have further questions,

Dave Wood
Speech Components Group
Microsoft

--

This posting is provided "AS IS" with no warranties, and confers no rights.

"Andreas Marschall [MVP TAPI]" <Andreas....@I-B-A-M.de> wrote in
message news:uMb9zq9w...@TK2MSFTNGP09.phx.gbl...

Jonarking

unread,

Nov 12, 2004, 1:03:02 PM11/12/04

to

Dave,

First of all, thank you so very VERY much for taking the time to answer my
question. I think I understand where you're coming from. If I want a direct,
contiguous feed into the engine then I'll need to build the COM object for
Speech automation. That way I simply feed in the bytes and SAPI figures out
when and how to read them. Otherwise, I can wait for chuncks of data to come
in and then assign them into the custom stream objects. Is that correct?

I am willing to try either option (I have some experience with C++ COM and
threading), however I have so far had no luck with the custom objects. I have
tried the SpMemoryStream, SpCustomStream, and SpStream objects and I can't
seem to get SAPI to budge! Can you give me some quick sample code showing how
to use them with raw PCM? I can set up the testbed to receive a single spoken
statement in raw PCM, write it out to a wav/RIFF file and feed it into a SAPI
Filestream object, and everything works. I just need to eliminate the
file-based middle man and write the PCM directly to the AudioInputStream
field of the Recognizer. IT JUST DOESN'T EVER WORK! Can you help me? If the
wav file works, what am I doing wrong?!!

asiboro

unread,

Mar 9, 2005, 12:21:03 AM3/9/05

to

Hi Jon,

I am having similar problem with the one you brought up on this thread. So
have you got any solution?

Matthias Moetje

unread,

Mar 9, 2005, 7:06:33 AM3/9/05

to

Asiboro,

could you please detail your actual problem - maybe others could
help, too.

Best regards,

Matthias Moetje
-------------------------------------
TERASENS GmbH
Ackermannstraße 3
80797 München
-------------------------------------
Fon: +49 89 143370-0
Fax: +49 89 143370-22
e-mail: moetje at terasens dot de
www: www.terasens.de
-------------------------------------

"asiboro" <asi...@discussions.microsoft.com> wrote in message
news:AA701FA1-82F1-41CE...@microsoft.com...

Joni Zhang

unread,

Mar 9, 2005, 10:45:08 AM3/9/05

to

Thanks for the response.
Well, actually I am on the same boat with Asiboro, so let me describe the
problem.

First, I've done the following:

1. Record audio input from application (PCM mono 16 bit)
2. Bind to wav file and save it
3. Feed it to SAPI (succeeded in recognizing)

Now, I'm trying to replace the step no. 2, i.e: without the intermediate file.
Now I have the audio data in bytes. So, to "whom" should I pass this
sequential of bytes?

I've tried to use SpStream, SpCustomStream, SpMemoryStream, SpStreamFormat,
but failed when I called Write(...) to write the audio bytes.

According to the documentation, I need to implement ISpAudio, and finally I
found the sample: ISpAudioPlug.
Can I use it?

I'm on the way attempting to use ISpAudioPlug, but failed when constructing
the object.

#import "C:\Lib\simpleaudio.dll" using namespace SIMPLEAUDIOLib;

...

CComPtr<ISpAudioPlug> strm;
strm.CoCreateInstance( ??? ); // I found no CLSID on *.tlh generated by
compiler, hence I'm stuck in here.

If this is not the correct way, any clues would be appreciated.

Matthias Moetje

unread,

Mar 9, 2005, 3:17:16 PM3/9/05

to

Joni,

I didn't quite understand the description of options in the first
post, but we have the direct streaming for TTS and ASR working
with our application. (Although we chose, to use it only for ASR).

Did you have a look at the "TapiCustomStream" sample from the
Speech SDK (5.1). It think it will help you do what you want.

Best regards,

Matthias Moetje
-------------------------------------
TERASENS GmbH
Ackermannstraße 3
80797 München
-------------------------------------
Fon: +49 89 143370-0
Fax: +49 89 143370-22
e-mail: moetje at terasens dot de
www: www.terasens.de
-------------------------------------

"Joni Zhang" <Joni...@discussions.microsoft.com> wrote in message
news:E9E30D38-6F1F-4854...@microsoft.com...

Joni Zhang

unread,

Mar 11, 2005, 8:17:03 AM3/11/05

to

Hi Matthias,
Thanks for the reply.

I don't using TAPI. Is there solution without using TAPI?
because I already have raw audio data bytes (PCM mono 16 bit).

Supposed that I captured audio bytes from microphone.
Now, I need to pass this bytes to SAPI.
So, I would need to write this bytes to buffer (ISpAudioPlug??)

From the SDK sample (Vb AudioApp), I got this:
Voice from mic (SpVoice) -> AudioPlugOut -> AudioPlugIn -> Recognizer

In my case, I think it should be like this (?):
Raw audio data bytes (unsigned char/BYTE) -> AudioPlugOut -> AudioPlugIn ->
Recognizer

I have no idea how to pass it to ISpAudioPlug.
The problem is really only how to pass this bytes to SAPI.

Matthias Moetje

unread,

Mar 11, 2005, 1:09:18 PM3/11/05

to

Joni,

if you are not using TAPI, then this is OT here in the TAPI newsgroup.

Still you might get some hints from the sample I mentioned.

Best regards,

Matthias Moetje
-------------------------------------
TERASENS GmbH
Ackermannstraße 3
80797 München
-------------------------------------
Fon: +49 89 143370-0
Fax: +49 89 143370-22
e-mail: moetje at terasens dot de
www: www.terasens.de
-------------------------------------

"Joni Zhang" <Joni...@discussions.microsoft.com> wrote in message

news:5B9A776A-BADA-4F07...@microsoft.com...