Sending files

Lewis Bergman

unread,

Apr 28, 2021, 3:31:47 PM4/28/21

to UniMRCP

I am discussing development with a programmer to use uniMRCP. He thinks that uniMRCP is only used in IVR situations with streaming media. I am pretty sure that the client used for testing just sends a file to the uniMRCP server. I tested the server with the client and it worked perfectly.

I guess I have some questions:

Do many use uniMRCP?
If so, do many using freeswitch use mod_uniMRCP?
Can uniMRCP accept static sound files?
Can it accept .wav files? <- My thought on this one is no and that the file might have to be base64 encoded, at least gor the GSR plugin.

Looking for answer to push back on developer ressitance to using uniMRCP. He wants to write code to Google. I would rather pay somebody to write code to something that I can change the provider backend without having to rewrite everything that uses SR or TS.

Thanks,

mayamatakeshi

unread,

Apr 28, 2021, 6:31:07 PM4/28/21

to uni...@googlegroups.com

On Thu, Apr 29, 2021 at 4:31 AM Lewis Bergman <lewis....@gmail.com> wrote:

I am discussing development with a programmer to use uniMRCP. He thinks that uniMRCP is only used in IVR situations with streaming media. I am pretty sure that the client used for testing just sends a file to the uniMRCP server.

Which client is this?

MRCP (v2) uses SIP to establish a media path between server and client and so audio would be transmitted as RTP streams (audio file data needs to be split in chunks and sent periodically to the server as per SDP negotiation)

I tested the server with the client and it worked perfectly.
I guess I have some questions:
Do many use uniMRCP?
If so, do many using freeswitch use mod_uniMRCP?
Can uniMRCP accept static sound files?
Can it accept .wav files? <- My thought on this one is no and that the file might have to be base64 encoded, at least gor the GSR plugin.
Looking for answer to push back on developer ressitance to using uniMRCP. He wants to write code to Google. I would rather pay somebody to write code to something that I can change the provider backend without having to rewrite everything that uses SR or TS.

Do you want to use uniMRCP to transcribe preexisting audio files?

If yes I think this is not a good use case for uniMRCP. I don't know about MRCP v1, but with MRCP v2, you would establish a SIP session and have data transmission constrained by the audio rate negotiated in the call.

So a wav file with 2 minutes of audio would take 2 minutes just to be sent from client to server.

Instead if you call Google API directly, all audio would be transmitted instantly.

There are lots of things an MRCP server has to do like controlling/processing timers, RTP jitter, VAD because of the real-time nature of the usual scenario that are not relevant for simple audio file transcription using Google (well VAD is always relevant for SR engines) as we are dealing with human actors and SIP/RTP issues.

Now, I checked the MRCP v2. RFC and found this:

9.4.10. Input-Waveform-URI

This optional header field specifies a URI pointing to audio content
to be processed by the RECOGNIZE operation. This enables the client
to request recognition from a specified buffer or audio file.

input-waveform-uri = "Input-Waveform-URI" ":" uri CRLF

https://tools.ietf.org/html/rfc6787

So if uniMRCP supports this then there might be a case to be made to use it for this scenario (you would need to serve the audio file using HTTP assuming Input-Waveform-URI can be an HTTP URI or upload it to the uniMRCP host file system).

But I suspect it doesn't as I tried to send this header to uniMRCP server:

MRCP/2.0 232 RECOGNIZE 1
channel-identifier: 5251fff6aa614ecd@speechrecog
speech-language: ja-JP
content-type: text/uri-list
input-waveform-uri: http://192.168.3.138:7777/some.wav
content-length: 25

builtin:speech/transcribe

but it didn't do anything with it according to the logs.

Anyway, in case of Google, its API is easy to use so I would just call it directly and other providers like AWS, Azure etc should have similarly simple APIs so I would not insert an MRCP layer for handling static files.

But some SR providers like nuance, voxeo, etc might only offer MRCP interface (I have no idea) so you would be constrained by them and then use of uniMRCP would make sense although it would be bad usage of resources as you are dealing with a platform created for real-time processing of data when your use case doesn't seem to have real-time constraints.

Lewis Bergman

unread,

Apr 28, 2021, 6:40:51 PM4/28/21

to uni...@googlegroups.com

Thank you for your thoughtful reply and point of view. That makes perfect sense and is in line with what the developer made a case for.

--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/CABaNFCb9oA1OE0e2x%3D%2B_0FSyN74iO%3DGB0n8oFwMbfeXJ3NOkRg%40mail.gmail.com.

--

Lewis Bergman

325-439-0533 Cell

Michael Levy

unread,

Apr 28, 2021, 7:52:40 PM4/28/21

to uni...@googlegroups.com

The answer from mayamatakeshi is excellent.

One other way to think about this is Unimrcp is an implementation of a client and server that support the MRCP protocol. MRCP is intended for use in telephony applications that require speech or recording services. MRCP can be used to communicate with speech recognizers, speech synthesizer, speech authentication, and speech recording.

Many IVRs are built using VXML and commercial VXML interpreters (known as "voice browsers") support MRCP to communicate with speech recognizers and synthesizers.

Commercial TTS engines and ASRs also support MRCP, so you could have a Nuance recognizer that you could access with MRCP (though other APIs are typically available).

Telephony systems communicating with speech recognizers or synthesizers are a natural match for MRCP and Unimrcp. Other use cases may not make sense at all.

To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/CAD2CnaRjydR0Ugt9bDTmjCVQT09tsaot%2BX2Z5ZqTHkBhLrfpLg%40mail.gmail.com.

Arsen Chaloyan

unread,

May 5, 2021, 9:15:37 PM5/5/21

to UniMRCP

I agree with the responses provided by Mayama and Michael, just a few comments from my side.

The UniMRCP project is an implementation of the MRCP standard. So, if you ask whether MRCP is well-suited for transcription of recorded wav files, then my answer would be no. Streaming speech transcription primarily supported by MRCP is good for real-time interactions.

On the other end, it does not mean that there is no such requirement to have a unified, preferably standardardized, interface for STT, TTS and many other APIs governed under the AI umbrella. And this is something that would be offered by Unispeech in the upcoming future beyond the MRCP scope. Stay tuned...

To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/CAJna2T_KMU-b1Z1Yjg%3DKV6rGm_0Gm5g%2B2XZar8mq5wY08J0khA%40mail.gmail.com.

--

Arsen Chaloyan
Author of UniMRCP
http://www.unimrcp.org

Lewis Bergman

unread,

May 5, 2021, 9:20:04 PM5/5/21

to uni...@googlegroups.com

Thank you all for the informative responses. We may hold off until Unispeech is released and evaluate our options then. Especially since mon_uniMRCP seems to be a poor choice since it is not maintained. Perhaps we will devise a mod_unispeech one day.

Reply all

Reply to author

Forward