Acquiring audio data for SR through RTP/RTCP stream

Philippe Roy

unread,

Mar 3, 2007, 12:39:06 PM3/3/07

to

How would I want to approach this? My audio source is from the virtual
world (and not a microphone) on a RTP/RTCP stream and I want SAPI to
perform speech recognition on it.

I see 2 potential approaches:

1) Create a virtual audio device driver. That way, SAPI won't even
know that it's not acquiring audio from a real microphone, but instead
I would have a fake audio device driver created for a fake microphone
that my SAPI session would point to, for audio acquisition. The down-
side to that approach is that creating an audio device driver is a
pain in the butt. There are many assumptions in regards to what can
and can't be done as far as memory management is concerned, and
getting a RTP stream to be accessible under these sets of assumptions
would be a challenge to say the least.

2) Create an ISpAudio interface in SAPI. That approach appears to be
more realistic.

What are your thoughts? Has anyone done that before?

Merlin

unread,

Mar 14, 2007, 11:49:50 AM3/14/07

to

Couldn't you also just use an SpMemoryStream?

Something like...

SpAudioFormat lSAF = new SpAudioFormat();
lSAF.Type =
SpeechLib.SpeechAudioFormatType.SAFT11kHz16BitMono;

mSRAudioIn = new SpMemoryStream();
mSRAudioIn.Format = lSAF;

mRecognizer = new SpInprocRecognizer();
mRecognizer.AudioInputStream =
(ISpeechBaseStream)mSRAudioIn;

Then you could just read from your RTP/RTCP stream write to SR stream by
using mSRAudioIn.Write().

"Philippe Roy" <speec...@googlemail.com> wrote in message
news:1172943546....@j27g2000cwj.googlegroups.com...

CousinBasil

unread,

Apr 11, 2007, 4:23:45 PM4/11/07

to

"Merlin",

I have attempted to get your idea working but the SR engine does not see the
data that I've written using mSRAudioIn.Write() as you describe. Here's my
abbreviated C# code using SAPI SDK 5.1:

using SpeechLib
...
private SpMemoryStream AudioIn;

private SpeechLib.SpInProcRecoContext recctx;

private SpeechLib.ISpeechRecoGrammar grammar;

private SpeechLib.SpInprocRecognizer recognizer;

private const int grammarId = 10;

SpAudioFormat Tp = new SpAudioFormat();

Tp.Type = SpeechAudioFormatType.SAFTADPCM_8kHzMono;

AudioIn = new SpMemoryStream();

AudioIn.Format = Tp;

recognizer = new SpInprocRecognizer();

recognizer.AudioInputStream = (SpeechLib.ISpeechBaseStream)AudioIn;

recctx = (SpeechLib.SpInProcRecoContext)recognizer.CreateRecoContext();

recctx.SoundStart += new
_ISpeechRecoContextEvents_SoundStartEventHandler(recctx_SoundStart);

recctx.StartStream += new
_ISpeechRecoContextEvents_StartStreamEventHandler(recctx_StartStream);

recctx.Hypothesis += new
_ISpeechRecoContextEvents_HypothesisEventHandler(recctx_Hypothesis);

recctx.Interference += new
_ISpeechRecoContextEvents_InterferenceEventHandler(recctx_Interference);

recctx.Recognition += new
_ISpeechRecoContextEvents_RecognitionEventHandler(recctx_Recognition);

grammar = recctx.CreateGrammar(grammarId);g

rammar.DictationLoad("", SpeechLoadOption.SLOStatic);

grammar.DictationSetState(SpeechRuleState.SGDSActive);

{ Event handlers left off for brevity... }

....

I have a callback which gets RTP data periodically:

void Mycallback( byte[] buffer )

{

AudioIn.Write( Buffer );

}

To verify the data, I've written the data to a file instead of the Memory
stream. I then used sox to convert it to a wav file and it plays back fine.

The curious thing here is that if I replace the SpMemoryStream with a
SpFileStream the SR engine will see and process data if the file has
leftover data from a previous run but any new data I write during execution
(using AudioIn.Write()) does not get processed.

Note that I've also tried Audio format SAFTCCITT_uLaw_8KHzMono as well to no
avail. I'm reasonably sure the input data is 8kHz one channel, 8 bit uLAW
data. The way I decoded it with sox is:

sox.exe -U -b -r 8000 -c 1 filename.raw filename.wav

Any clues SAPI guru's out there?

Pete

"Merlin" <bill...@hotmail.com> wrote in message
news:73EA6B6C-FD80-43D1...@microsoft.com...