recognition-complete event follows start-of-speech with a 0.2 second time-lag

95 views
Skip to first unread message

dank

unread,
Oct 22, 2008, 8:05:30 AM10/22/08
to UniMRCP
Hello .*

I am currently working on a MRCP ASR client which is communicating
with a Loquendo Speech Server. At the moment there is just a hacked
version of the unimrcpclient to make it compatible with Loquendo and
which is used for testing. A common recognition session looks like
this:

2008-10-14 15:13:31:500600 [INFO] Set [demo.pcm] as Speech Source
2008-10-14 15:13:31:502600 [INFO] Send MRCP Request to Server
<000026A048F49AC7> [1]
2008-10-14 15:13:31:505600 [INFO] Send MRCPv2 Message size=483
MRCP/2.0 483 RECOGNIZE 1
Channel-Identifier:000026A048F49AC7@speechrecog
Cancel-If-Queue:false
Content-Type:application/srgs+xml
Content-Id:requ...@form-level.store
Content-Length:290

<?xml version="1.0"?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
xml:lang="de-DE" version="1.0" root="rootRule">
<rule id="rootRule"> <one-of>
<item>Test</item>
<item>eins</item>
<item>zwei</item>
<item>drei</item>
</one-of>
</rule></grammar>

2008-10-14 15:13:31:549600 [INFO] Receive MRCPv2 Message size=83
MRCP/2.0 83 1 200 IN-PROGRESS
Channel-Identifier: 000026A048F49AC7@speechrecog

2008-10-14 15:13:41:468600 [INFO] Receive MRCPv2 Message size=148
MRCP/2.0 148 START-OF-INPUT 1 IN-PROGRESS
Channel-Identifier: 000026A048F49AC7@speechrecog
Proxy-Sync-Id: 041E363400000039
Input-Type: speech

2008-10-14 15:13:41:474600 [INFO] Send MRCP Event to Application
<000026A048F49AC7>
2008-10-14 15:13:41:667600 [INFO] Receive MRCPv2 Message size=622
MRCP/2.0 622 RECOGNITION-COMPLETE 1 COMPLETE
Channel-Identifier: 000026A048F49AC7@speechrecog
Proxy-Sync-Id: 041E363400000039
Completion-Cause: 001 no-match
Content-Type: application/nlsml+xml
Content-Length: 401

<?xml version="1.0" encoding="UTF-8"?>
<result grammar="requ...@form-level.store#rootRule">
<interpretation grammar="requ...@form-level.store#rootRule"
confidence="0.026392">
<instance confidence="0.026392">
zwei
</instance>
<input mode="speech" confidence="0.026392">
<nomatch>zwei</nomatch>
<input confidence="0.026392">
zwei
</input>
</input>
</interpretation>
</result>

If you look at the timestamps you can see two major problems:

1. The START-OF-SPEECH message was received 10 seconds after the IN-
PROGRESS event, although the file I am streaming contains continous
speech from the beginning for about 15 seconds.

2. The RECOGNITION-COMPLETE event follows the START-OF-SPEECH with a
time-lag of 0.2 seconds, but there are still around 5 seconds of
speech left.

I've read the RFC and looked through the documentation of my MRCP
server but couldn't find any useful hints.
That's why I wonder whether this is really a MRCP related issue? If
yes, where can I find more informations to solve this problem? Does
anyone experienced similar behaviour?

Thanks & regards,

Daniel

Arsen Chaloyan

unread,
Oct 22, 2008, 10:07:35 AM10/22/08
to uni...@googlegroups.com
Hi Daniel,

On Wed, Oct 22, 2008 at 5:05 PM, dank <danie...@dai-labor.de> wrote:
>
> Hello .*
>
> I am currently working on a MRCP ASR client which is communicating
> with a Loquendo Speech Server. At the moment there is just a hacked
> version of the unimrcpclient to make it compatible with Loquendo and
> which is used for testing.

I assume hack is in the hard coded demo recognition scenario. Surely
It'll be perfect to use a scripting language capable to describe
custom MRCP scenarios instead, but this is another story.

Timestamps look indeed strange. Network capture (wireshark/tcpdump)
may shine a light to what is actually going on, I mean comparing
timestamps from the console output and net capture.

> I've read the RFC and looked through the documentation of my MRCP
> server but couldn't find any useful hints.
> That's why I wonder whether this is really a MRCP related issue? If
> yes, where can I find more informations to solve this problem? Does
> anyone experienced similar behaviour?

I have no Loquendo behind to test against, anyway try to play with
start-input-timers header and/or recognition grammar you supplied as
you received Completion-Cause: 001 no-match.
Also demo.pcm you provided is assumed to be 8kh 16bit pcm.

HTH,
Arsen.


> Thanks & regards,
>
> Daniel
> >
>

Daniel Käs

unread,
Nov 5, 2008, 10:16:53 AM11/5/08
to uni...@googlegroups.com
Hello everyone,

I've played around with the Loquendo settings and spoke with the support
team. At the moment it seems to be an encoding problem. With the
Loquendo Speech Suite comes a demo program which I can run locally on my
server to test the ASR engine without using MRCP.
I discovered that Loquendo expects streams encoded with the European
A-law standard, using audio files with this encoding (A-law, 16bit,
8kHz, pcm) the recognitions works perfectly.

Now I want to stream my audio files over MRCP instead using the demo
program. If I'm just renaming the files which I had used successfully
with the demo to "demo.pcm" and run my MRCP client, I still get errors
(Completion-Cause: recognizer-error or no-match-maxtime).

Furthermore the same audio data works fine with signed linear sample
encoding, although the MRCP server answers always with "no-match" and a
very low confidence score, but at least he tried to recognized the data.
Using Loquendos demo program with the same file will result in an error
without ever raising a speech detected event, what sounds very much like
an encoding issue.

So here is my question:
Does UniMRCP expects a specific encoding? If yes, where is this
documented and where I can adjust the standard encoding settings?

Thanks & regards,

Daniel

Arsen Chaloyan schrieb:

Arsen Chaloyan

unread,
Nov 5, 2008, 12:29:22 PM11/5/08
to uni...@googlegroups.com
Hi Daniel,

If you need to transmit A-law encoded stream to the server using
unimrcpclient demo application as is basis, do the following
1. Supply demo.pcm in 16bit, 8kHz, linear pcm format as an input
2. Modify unimrcpclient.xml to explicitly use only PCMA codec in the offer
Find
<!-- <param name="codecs" value="PCMU PCMA L16/96/8000"/> -->
Replace with
<param name="codecs" value="PCMA"/>

This should be enough. As a result, A-law, 16bit, 8kHz stream will be
transmitted to the server.

Alternatively you may want to supply demo.pcm in A-law format. However
in this case code of demo unimrcpclient application should be changed
as it currently expects input stream in clear pcm format.

I expect there should be no major issue, but if you still have some
troubles in interop don't hesitate to contact me.
More over, if I have a trial or whatever version of Loquendo server,
I'll try to find time and ensure the interoperability and make the
results and required configuration publicly available for the
community. Probably it should be in interests of Loquendo as well.

Regards,
Arsen.

Daniel Käs

unread,
Nov 6, 2008, 6:20:20 AM11/6/08
to uni...@googlegroups.com
Hi Arsen,

thank you for the fast anwser, I almost forgot the config file ;-)
The problem seems to be solved, now my timestamps are looking good and
the Loquendo ASR does it's job. The confidence score is quite low, but
that's a different story.

Thanks & regards,

Daniel

Arsen Chaloyan schrieb:
> Hi Daniel,
>

Reply all
Reply to author
Forward
0 new messages