SAPI 5.0 TTS output to buffer

Daniel Heckenberg

unread,

Oct 12, 2000, 9:21:08 PM10/12/00

to

I'm trying to use the SAPI 5.0 API to do TTS and produce output into a
buffer. My application is involves realtime behaviour and is CPU intensive
so I want to produce output in small blocks (around 1000 samples) as
required. Ideally the TTS processing should be spread evenly time required
to output the speech.

After fishing through the documentation and sample code for the SAPI 5.0
beta, my understanding is that I can implement a class with the ISpAudio
interface that should perform the required functions.

This seems like a rather generic scenario, so I'm wondering whether anyone
else has done this and whether my approach is littered with pitfalls. As
the doco is pretty sketchy at the moment, I fear some surprises.

Daniel Heckenberg
Design Engineer
Lake Technology Ltd.

The Microsoft Speech SDK Team

unread,

Oct 17, 2000, 3:00:00 AM10/17/00

to

Hi Daniel,

So there shouldn't be any problems implementing ISpAudio as you described.
Here a few points:
- You may want to just implement ISpStreamFormat as this may be simpler.
- The threading model on your new object needs to be set to "Both" (For
TTS it won't actually be called on multiple threads, but you still need the
setting).
- There are some problems when an application calls ISpVoice::Pause()
when using a custom audio object so do not use this method.

Note there may be an easier way to do all this by using the SpStream helper
class. This class already implements ISpStreamFormat, and allows the output
to be set to either a wave file or an IStream. This (oft-repeated) sample
uses CreateStreamOnHGlobal to make an IStream using Win32 global memory and
then passes that to the TTS engine:

CComPtr<ISpStream> cpStream;
CComPtr<IStream> cpBaseStream;
GUID guidFormat; WAVEFORMATEX* pWavFormatEx;
HRESULT hr = cpStream.CoCreateInstance(CLSID_SpStream);
if(SUCCEEDED(hr))
{
hr = CreateStreamOnHGlobal(NULL, FALSE, &cpBaseStream);
}
if(SUCCEEDED(hr))
{
hr = SpConvertStreamFormatEnum(SPSF_22kHz16BitMono, &guidFormat,
&pWavFormatEx);
}
if(SUCCEEDED(hr))
{
hr = cpStream->SetBaseStream(cpBaseStream, guidFormat,
pWavFormatEx);
cpBaseStream.Release();
}
if(SUCCEEDED(hr))
{
hr = cpVoice->SetOutput(cpStream, TRUE);
}

Then when you want to access the memory use the GetHGlobalFromStream and
then GlobalLock Win32 methods.

You certainly don't have to do it this way but it might be simpler.

Hope this helps,

"Daniel Heckenberg" <d.heck...@lake.com.au> wrote in message
news:eu8#pPLNAHA.258@cppssbbsa05...

Daniel Heckenberg

unread,

Oct 22, 2000, 11:58:23 PM10/22/00

to

Thanks for your suggestions. As it happens, I baulked at implementing
ISpAudio, and used the Win32 global stream object to do all the hard work in
an implementation of ISpStreamFormat as you suggest.

This has met all of my immediate requirements but does not allow control of
the synthesis processing. That is, using ISpStream allows you to set where
the synthesis output goes, but does not allow real-time control of the
synthesis so that blocks of audio are only produced as required by the
output device.

Is there any way of achieving this kind of control, other than implementing
ISpAudio?

I suppose that using the Win32 global stream object to implement the
fundamental IStream of the ISpAudio object means that the code would be
reasonably straightforward...

Daniel Heckenberg
Design Engineer
Lake Technology Ltd.

The Microsoft Speech SDK Team <sa...@microsoft.com> wrote in message
news:39ece7b4$1...@news.microsoft.com...

Message has been deleted

Yash Girdhar

unread,

Jun 1, 2014, 3:00:57 AM6/1/14

to

Thank you so much for this sample code. It helped me a lot :)

Here is my working solution if anyone needs it.

https://github.com/itsyash/MS-SAPI-demo