How to implement UniMRCP synth plugin by HTTP calls to TTS web service?

125 views
Skip to first unread message

Greg

unread,
May 9, 2023, 3:28:03 PM5/9/23
to UniMRCP
I'm trying to implement the UniMRCP synth plugin by HTTP calls to an existing TTS web service.

Given an HTTP request with a text parameter, this existing TTS web service can synthesize and return the corresponding audio (either in one piece, or as a chunked stream).

I was able to build unimrcp (1.7.0) with dependencies (1.6.0), and compile a scaffold synth plugin based on unimrcp/plugins/demo-synth/src/demo_synth_engine.c

However, I can't find an explanation how to implement that synth plugin using HTTP calls.  Should I just change the code of demo_synth_channel_speak() to download synth_channel->audio_file via an HTTP call? 

Separately, what about the chunked stream scenario?  Are there some helpful code samples?


Arsen Chaloyan

unread,
May 11, 2023, 4:20:26 PM5/11/23
to uni...@googlegroups.com
Hi Greg,

It would be purely up to the implementation of the plugin on how to place an HTTP request and whether to process the response in one shot or in multiple chunks. The latter would be preferable but will not have any significant difference in response times on average prompts. You may use the mpf_buffer to write audio chunks received from the API and feed them to the MPF layer from the read callback.

One of the precautions is do not block the callbacks invoked in the plugin context but place HTTP requests asynchronously.

--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/6d61b0e3-22af-4b24-aa4a-0566876e7993n%40googlegroups.com.


--
Arsen Chaloyan
Author of UniMRCP
http://www.unimrcp.org

Greg

unread,
May 19, 2023, 2:11:42 PM5/19/23
to UniMRCP
Arsen, thanks for your response.

Apparently, even before implementing HTTP calls, I run into this problem when trying to start with the demo-synth mock-up:

WARNING[14269]: chan_sip.c:7508 sip_write: Asked to transmit frame type slin16, while native formats is (g722) read/write = g722/slin16

This has the effect of not hearing the audio (in zoiper5 softphone), while the rest of the setup (asterisk / unimrcpserver / zoiper5) seems to be working fine, with response status 200 OK, and normal-looking logs (except for the above warning).

Seemingly, that warning indicates that the SIP channel is trying to transmit a frame in the slin16 (16-bit linear PCM) format, but the native format for the channel is g722 (7200 Hz wideband audio). 

Now, the only reference to pcm I see in the demo synth plugin code is this one:
char *file_name = apr_psprintf(channel->pool,"demo-%dkHz.pcm",descriptor->sampling_rate/1000);

And the actual demo files are:
$ find -name "*.pcm"
./data/demo-8kHz.pcm
./data/demo-16kHz.pcm

However in Zoiper5 we use the G.722 codec, and I don't see the slin16 codec listed.  Normally unimrcp should be able to convert slin16 to/from g722, right?  It's working for our ASR but not for TTS.

Greg

Arsen Chaloyan

unread,
Jun 16, 2023, 7:24:08 PM6/16/23
to uni...@googlegroups.com
Hi Greg,

The problem with chan_sip in Asterisk has very little to do with the implementation of the speech synthesizer plugin to UniMRCP Server. This is purely a configuration problem on Asterisk which you need to address first. BTW, with UniMRCP 1.8.0 you have support for the G.722 codec in place and should be able to establish 16 kHz audio end to end for both ASR and TTS sessions.

Perhaps you should tackle one problem at a time and focus on making the solution work first with 8-kHz audio end to end.

Greg

unread,
Aug 15, 2023, 10:19:57 PM8/15/23
to UniMRCP
Hi Arsen, thanks for your advice.

I switched from Asterisk to FreeSWITCH, and was able to get a mock audio response from my scaffold TTS plugin -- which is almost identical to your demo_synth_engine.c except for demo_synth_channel_speak() where I replaced the demo audio file name from "demo-%dkHz.pcm" to "johnsmith-8kHz.pcm".

Now, should I simply insert an HTTP request to my TTS service, save the result to (say) "tts.pcm" and use it as the demo audio file name?  And do it all inside demo_synth_channel_speak()?  I'm really a novice here, what's the easiest way to achieve an initial implementation, and maybe improve it later?

Guy

Arsen Chaloyan

unread,
Oct 20, 2023, 5:35:20 PM10/20/23
to uni...@googlegroups.com
Hi Guy,

> I switched from Asterisk to FreeSWITCH, and was able to get a mock audio response from my scaffold TTS plugin -- which is almost identical to your demo_synth_engine.c except for demo_synth_channel_speak() where I replaced the demo audio file name from "demo-%dkHz.pcm" to "johnsmith-8kHz.pcm".
From the plugin implementation perspective, it should not really matter whether you use Asterisk, FreeSWITCH or any other MRCP-compliant platform.

> Now, should I simply insert an HTTP request to my TTS service, save the result to (say) "tts.pcm" and use it as the demo audio file name?  And do it all inside demo_synth_channel_speak()? 
It is not required to save synthesized audio data received in the HTTP response to a file. You can supply in-memory audio data to the mpf callback.

> I'm really a novice here, what's the easiest way to achieve an initial implementation, and maybe improve it later?
The easiest way would probably be placing an HTTP request synchronously waiting for a response in demo_synth_channel_speak(). However, that will not be the right approach, as HTTP requests/responses must be handled asynchronously to be able to process concurrent requests.

Reply all
Reply to author
Forward
0 new messages