wav file reco to text

Hai Xu

unread,

Jan 13, 2004, 11:05:35 PM1/13/04

to

Hello,

The following source code shows a simple console program that takes wav file
to reco into text.

however the statement
" hr = cpRecoResult->GetPhrase((SPPHRASE**)&pPhrase); "
crashes the program with segmentation violation error,
any one knows why ?

Thanks

Hai

I've removed all status check and comments to make it shorter
================
#include "stdafx.h"
#include <windows.h>

#include <sapi.h>
#include <spdebug.h>
// SAPI Header Files
#include <sphelper.h>
#include <spddkhlp.h>

int main(int argc, char* argv[])
{
HRESULT hr;
CComPtr<ISpStream> cpInputStream;
CComPtr<ISpRecognizer> cpRecognizer;
CComPtr<ISpRecoContext> cpRecoContext;
CComPtr<ISpRecoGrammar> cpRecoGrammar;

CComPtr<ISpRecoResult> cpRecoResult;
CComPtr<ISpPhrase> pPhrase;
WCHAR *pwszText;
CoInitialize(NULL);

hr = cpInputStream.CoCreateInstance(CLSID_SpStream);

CSpStreamFormat sInputFormat(SPSF_22kHz8BitMono, &hr);

hr = cpInputStream->BindToFile(L"test.wav",
SPFM_OPEN_READONLY,
&sInputFormat.FormatId(),
sInputFormat.WaveFormatExPtr(),
SPFEI_ALL_EVENTS); // SPFEI_ALL_EVENTS
hr = cpRecognizer.CoCreateInstance(CLSID_SpInprocRecognizer);
hr = cpRecognizer->SetInput(cpInputStream, TRUE);
hr = cpRecognizer->CreateRecoContext(&cpRecoContext);
hr = cpRecoContext->CreateGrammar(NULL, &cpRecoGrammar);
hr = cpRecoGrammar->LoadDictation(NULL,SPLO_STATIC);
hr = cpRecoContext->SetInterest(SPFEI(SPEI_RECOGNITION) |
SPFEI(SPEI_END_SR_STREAM), SPFEI(SPEI_RECOGNITION) |
SPFEI(SPEI_END_SR_STREAM));
hr = cpRecoContext->SetNotifyWin32Event();
hr = cpRecoGrammar->SetDictationState(SPRS_ACTIVE);
BOOL fEndStreamReached = FALSE;
while (!fEndStreamReached)
{
hr = cpRecoContext->WaitForNotifyEvent(INFINITE);

CSpEvent spEvent;
while (!fEndStreamReached && S_OK == spEvent.GetFrom(cpRecoContext))
{
switch (spEvent.eEventId)
{
case SPEI_RECOGNITION:
hr = cpRecoResult->GetPhrase((SPPHRASE**)&pPhrase);
hr = pPhrase->GetText(SP_GETWHOLEPHRASE,
SP_GETWHOLEPHRASE, TRUE, &pwszText, NULL);
printf("result ==> %s\n", pwszText);
break;

case SPEI_END_SR_STREAM:
fEndStreamReached = TRUE;
break;
}
spEvent.Clear();
}
}
hr = cpRecoGrammar->SetDictationState(SPRS_INACTIVE);
hr = cpRecoGrammar->UnloadDictation();
hr = cpInputStream->Close();
cpRecognizer.Release();
CoUninitialize();

return 0;
}
===============================

Dave Wood [MS]

unread,

Jan 14, 2004, 12:56:41 PM1/14/04

to

Unless I'm missing something, you aren't initializing cpRecoResult to
anything when the SPEI_RECOGNITION event occurs. Try adding a line to get
the result:
cpRecoResult = spEvent.RecoResult();

and then one after processing that result to release the result
cpRecoResult.Release();

--

This posting is provided "AS IS" with no warranties, and confers no rights.

"Hai Xu" <h...@macrosoftinc.com> wrote in message
news:ehuXTOl2...@TK2MSFTNGP11.phx.gbl...

Hai Xu

unread,

Jan 14, 2004, 3:32:59 PM1/14/04

to

Hi, Dave
Thanks for your response.
However, after I add the line you suggested
cpRecoResult = spEvent.RecoResult();
The next line gets crashed, which is:
hr = pPhrase->GetText(SP_GETWHOLEPHRASE,SP_GETWHOLEPHRASE, TRUE, &pwszText,
NULL);

Any idea why ?

Thanks

Hai

"Dave Wood [MS]" <dave...@online.microsoft.com> wrote in message
news:O3mvEfs2...@tk2msftngp13.phx.gbl...

Dave Wood [MS]

unread,

Jan 14, 2004, 3:48:31 PM1/14/04

to

Ah I see. GetPhrase is returning a structure {SPPHRASE}. You are casting
this to a COM pointer {ISpPhrase} and then trying to call a method on it.
This is definitely going to crash. I think you just want to call GetText on
cpRecoResult directly.

--

This posting is provided "AS IS" with no warranties, and confers no rights.

"Hai Xu" <h...@macrosoftinc.com> wrote in message

news:eXnFQyt2...@tk2msftngp13.phx.gbl...

Hai Xu

unread,

Jan 14, 2004, 4:44:33 PM1/14/04

to

Hi, Dave

Thank you very much for your great help.
Well, I now get some result, however, so terrible that makes me think this
may not be good way to go with.
Also, WaitForNotifyEvent(INFINITE); keeps returning S_OK that makes
spEvent.GetFrom() crashed, do you
have any suggestion for me to do this simple task ?

The basic job I want to do is to take a wave file, then reco its content to
text.
Is my code mostly OK, or where did I miss anything ?

Thanks again

Hai

"Dave Wood [MS]" <dave...@online.microsoft.com> wrote in message

news:OzblH$t2DHA...@TK2MSFTNGP09.phx.gbl...

Hai Xu

unread,

Jan 14, 2004, 10:02:02 PM1/14/04

to

Thanks for the help.
I've figured out the way to build it.
The code is included below.
This code is for a quick testing, not every point is checked against hr
status.

==================
// WaveASR.cpp : take a wav file
// reco the content into text
// print out the result into the console

#include <windows.h>

// SAPI Header Files
#include <sapi.h>
#include <spdebug.h>
#include <sphelper.h>
#include <spddkhlp.h>

int main(int argc, char* argv[])
{
HRESULT hr;

printf("Start wav file recognizing ...\n");

CComPtr<ISpStream> cpInputStream;
CComPtr<ISpRecognizer> cpRecognizer;
CComPtr<ISpRecoContext> cpRecoContext;
CComPtr<ISpRecoGrammar> cpRecoGrammar;
CComPtr<ISpRecoResult> cpRecoResult;
CComPtr<ISpPhrase> pPhrase;

WCHAR *pwszText;

CoInitialize(NULL);

// Create basic SAPI stream object
// NOTE: The helper SpBindToFile can be used to perform the following
operations
hr = cpInputStream.CoCreateInstance(CLSID_SpStream);
// Check hr

// set wav format as 22kHz, 16-bit, Mono
CSpStreamFormat sInputFormat(SPSF_22kHz16BitMono, &hr);

// setup stream object with a wav file name
// for read-only access, since it will only be access by the SR engine
// if wav file is in other directory, use L"C:\\temp\\test.wav"
// L specifies unicode, same to (LPCWSTR)

hr = cpInputStream->BindToFile(L"test.wav",
SPFM_OPEN_READONLY,
&sInputFormat.FormatId(),
sInputFormat.WaveFormatExPtr(),
SPFEI_ALL_EVENTS);

if(hr == S_OK)
{
}
else if(hr == E_INVALIDARG)
{
printf("E_INVALIDARG\n");
}
else if(hr == E_OUTOFMEMORY)
{
printf("E_OUTOFMEMORY\n");
}
else if(hr == STG_E_FILENOTFOUND)
{
printf("STG_E_FILENOTFOUND\n");
}
else if(hr == SPERR_ALREADY_INITIALIZED)
{
printf("SPERR_ALREADY_INITIALIZED\n");
}
else if(FAILED(hr))
{
printf("cpInputStream->BindToFile failed !\n");
}

// Create in-process speech recognition engine
hr = cpRecognizer.CoCreateInstance(CLSID_SpInprocRecognizer);
// Check hr
if(hr == S_OK)
{
}
else if(hr == E_INVALIDARG)
{
printf("E_INVALIDARG\n");
}
else if(hr == SPERR_ENGINE_BUSY)
{
printf("SPERR_ENGINE_BUSY\n");
}
else if(FAILED(hr))
{
printf("cpRecognizer.CoCreateInstance failed !\n");
}
// connect wav input to recognizer
// SAPI will negotiate mismatched engine/input audio formats
// using system audio codecs, so second parameter is not
// important - use default of TRUE

hr = cpRecognizer->SetInput(cpInputStream, TRUE);

// Create recognition context to receive events
hr = cpRecognizer->CreateRecoContext(&cpRecoContext);

// Create grammar, and load dictation
// ignore grammar ID for simplicity's sake
// NOTE: Voice command apps would load CFG here

hr = cpRecoContext->CreateGrammar(NULL, &cpRecoGrammar);
hr = cpRecoGrammar->LoadDictation(NULL,SPLO_STATIC);

// check for recognitions and end of stream event

hr = cpRecoContext->SetInterest(SPFEI(SPEI_RECOGNITION) |
SPFEI(SPEI_END_SR_STREAM), SPFEI(SPEI_RECOGNITION) |
SPFEI(SPEI_END_SR_STREAM));

// use Win32 events for command-line style application
hr = cpRecoContext->SetNotifyWin32Event();

// activate dictation, and begin recognition
hr = cpRecoGrammar->SetDictationState(SPRS_ACTIVE);
// Check hr
if(hr == S_OK)
{
}
else if(hr == E_INVALIDARG)
{
printf("E_INVALIDARG\n");
}
else if(hr == SP_STREAM_UNINITIALIZED)
{
printf("SP_STREAM_UNINITIALIZED\n");
}
else if(hr == SPERR_UNINITIALIZED)
{
printf("SPERR_UNINITIALIZED\n");
}
else if(hr == SPERR_UNSUPPORTED_FORMAT)
{
printf("SPERR_UNSUPPORTED_FORMAT\n");
}
else if(FAILED(hr))
{
printf("cpRecoGrammar->SetDictationState(SPRS_ACTIVE) failed !\n");
}

// while events occur, continue processing
// timeout should be greater than the audio stream length,
// or a reasonable amount of time expected to pass before
// no more recognitions are expected in an audio stream
// INFINITE is a little risky for hanging the program
BOOL fEndStreamReached = FALSE;
while (!fEndStreamReached && S_OK ==
cpRecoContext->WaitForNotifyEvent(INFINITE)) // set time out 60 seconds here
{
CSpEvent spEvent;
// pull all queued events from the reco context's event queue

while (!fEndStreamReached && S_OK == spEvent.GetFrom(cpRecoContext))
{

// Check event type
switch (spEvent.eEventId)
{
// speech recognition engine recognized some audio
case SPEI_RECOGNITION:
// get result in even queue
cpRecoResult = spEvent.RecoResult();
//hr = cpRecoResult->GetPhrase((SPPHRASE**)&pPhrase);
// get the phrase's entire text string, including replacements
//hr = pPhrase->GetText(SP_GETWHOLEPHRASE, SP_GETWHOLEPHRASE, TRUE,
&pwszText, NULL);
hr = cpRecoResult->GetText(SP_GETWHOLEPHRASE, SP_GETWHOLEPHRASE, TRUE,
&pwszText, NULL);
// get the phrase's first 2 words, excluding replacements
//hr = pPhrase->GetText(pPhrase->Rule.ulFirstElement, 2, FALSE,
&pwszText, NULL);
//int nLen = WideCharToMultiByte(CP_ACP, 0, pwszText, -1, 0, 0, 0, 0);
wprintf(L"result ==> %s\n", pwszText);
cpRecoResult.Release();
break;

// end of the wav file was reached by the speech recognition engine

case SPEI_END_SR_STREAM:
fEndStreamReached = TRUE;
break;
}

// clear any event data/object references
spEvent.Clear();
}// END event pulling loop - break on empty event queue OR end stream
}// END event polling loop - break on event timeout OR end stream

// deactivate dictation
hr = cpRecoGrammar->SetDictationState(SPRS_INACTIVE);

// unload dictation topic
hr = cpRecoGrammar->UnloadDictation();

// close the input stream, since we're done with it
// NOTE: smart pointer will call SpStream's destructor,
// and consequently ::Close, but code may want to check
// for errors on ::Close operation
hr = cpInputStream->Close();

cpRecognizer.Release();
CoUninitialize();

return 0;
}
==================
Hai

"Hai Xu" <h...@macrosoftinc.com> wrote in message

news:eGpNPau2...@tk2msftngp13.phx.gbl...

liuy...@163.com

unread,

May 25, 2014, 8:24:35 AM5/25/14

to

I am glad to see your code. i am interested to wav ASR too,and i also use Microsoft Speech SDK 5.1,i am in China,so i mainly used to recognize Chinese. i readed your code and i tried to execute it.But i can just succeed to recognized only the number.Please tell me what i should do to recognize Chinese! i look forward to hearing from you soon!

liuy...@163.com

unread,

May 25, 2014, 8:26:25 AM5/25/14

to

>Hi,Xu

I am glad to see your code. i am interested to wav ASR too,and i also use Microsoft Speech SDK 5.1,i am in China,so i mainly used to recognize Chinese. i readed your code and i tried to execute it.But i can just succeed to recognized only the number.Please tell me what i should do to recognize Chinese!

I look forward to hearing from you soon!