Continuous speech recognition in (Uni)MRCP

129 views
Skip to first unread message

Vali

unread,
Sep 23, 2010, 6:02:49 AM9/23/10
to UniMRCP
Hi Arsen,

we would like to implement distributed continuous speech recognition.
MRCP is an obvious choice for this, but unfortunately the spec does
not say much on this topic. The scenario is not typical
RECOGNIZE <start-speech>-<end-speech> RESULT
but
RECOGNIZE <start-speech> PARTIAL_RESULT <speech> ... <end-speech>
RESULT

I am thinking about two approaches to achieve this:

1. Sending (periodically) IN-PROGRESS message with body containing
partial results. It should be harmless since it does not change state
of the recognizer. But I am not sure it is allowed in RECOGNIZING
state. Maybe START-OF-INPUT might be used instead, but sending it more
than once?

2. Polling the server from client. After first RECOGNIZE send
periodically subsequent RECOGNIZE requests (again not changing the
state) and respond with messages containing partial results in their
bodies.

Which approach do you find cleaner? Or do you have any other idea?

We could also send RECOGNITION-COMPLETE when we have a partial result
and continue recognizing the stream while waiting for the next
RECOGNIZE request, but this I find ugly.

Thanks for your thoughts,
- Vali

Arsen Chaloyan

unread,
Dec 8, 2010, 8:29:18 PM12/8/10
to uni...@googlegroups.com
Hi Vali,

I'll definitely vote for the first approach such as sending
provisional responses with or without partial or preliminary results.
For instance, it's quite legitimate to send one or more provisional
TRYING responses in SIP. These responses may or may not contain
message bodies. So, why not to use the same concept in the MRCP dialog
as follows

C->S: RECOGNIZE
S->C: IN-PROGRESS
S->C: START-OF-INPUT
S->C: IN-PROGRESS partial_result_1
S->C: IN-PROGRESS partial_result_n
S->C: RECOGNITION-COMPLETE final_result

Moreover, this would be a nice addition to the spec, I think.

> --
> You received this message because you are subscribed to the Google Groups "UniMRCP" group.
> To post to this group, send email to uni...@googlegroups.com.
> To unsubscribe from this group, send email to unimrcp+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/unimrcp?hl=en.
>
>

--
Arsen Chaloyan
The author of UniMRCP
http://www.unimrcp.org

Vali

unread,
Dec 19, 2010, 9:29:17 AM12/19/10
to UniMRCP
Hi Arsen,

I agree this is a nice solution. I tried to implement it and only
START-OF-INPUT messages worked. In other words I was not able to send
message in S->C direction with other than START-OF-INPUT method (ok,
there is RECOGNITION-COMPLETE, but that would change the state). For
example sending another IN-PROGRESS response to RECOGNIZE request is
not possible. But this is sufficient for now.

There should be also an option to turn on such a behaviour, for
example Recognition-Mode: lvcsr (as another alternative to normal and
hotword). I am going to solve it by Vendor-Specific-Params for now.

By the way, the final_result is not important for our application
(long [perhaps minutes] dictation). Typical scenario will look like
this:
C->S: RECOGNIZE
S->C: IN-PROGRESS
S->C: START-OF-INPUT
S->C: START-OF-INPUT result_1
S->C: START-OF-INPUT result_n
C->S: STOP

As you said, this would be nice addition to the spec. Do you have any
idea how this procedure takes place? I would just send a suggestion e-
mail to the authors mentioned here:
http://tools.ietf.org/html/draft-ietf-speechsc-mrcpv2-22#appendix-B

Thanks for your thoughts,
- Vali


> > For more options, visit this group athttp://groups.google.com/group/unimrcp?hl=en.

Arsen Chaloyan

unread,
Dec 21, 2010, 2:30:55 AM12/21/10
to uni...@googlegroups.com
Hi Vali,

Thinking a bit more about the topic, I come to a conclusion that an
asynchronous event from the server to the client (something similar to
or intermediate between START-OF-INPUT and RECOGNITION-COMPLETE
events) would match better to your case. And this is what you have
already done. I see you have moved even further with a new
recognition-mode which looks reasonable to me too.

You could try sending this suggestion to the authors of the specs
directly. Perhaps it will be better to send it to the Speechcs mailing
list instead.

https://www1.ietf.org/mailman/listinfo/speechsc

Good luck!

> For more options, visit this group at http://groups.google.com/group/unimrcp?hl=en.

Reply all
Reply to author
Forward
0 new messages