Your engine should have the corresponding streaming API to get
synthesized speech instead of writing it to a wave file. If so, you
should use that API to get/fill media frames from the MPF's read_frame
callback. See demo_synth_stream_read or flite_synth_stream_read
functions available in the source tree.
> --
> You received this message because you are subscribed to the Google Groups
> "UniMRCP" group.
> To post to this group, send email to uni...@googlegroups.com.
> To unsubscribe from this group, send email to
> unimrcp+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/unimrcp?hl=en.
>
--
Arsen Chaloyan
The author of UniMRCP
http://www.unimrcp.org
On Mon, Jul 19, 2010 at 7:23 PM, Iman Saleh <iman.sa...@gmail.com> wrote:
> Hi,
>
> I checked demo_synth_stream_read method. What I see is that it reads frames
> from the audio file and stores them in a buffer.
Yes, it doesn't perform actual synthesizing, but just simulates the
job by reading frames from the audio input file.
>It is still not clear for me what will the case be on my side. Should the TTS engine add frames of the
> generated speech file to frame->codec_frame.buffer?
Yes, you should pass/write the generated speech to the mpf_frame
available within demo_synth_stream_read() function. Note that
demo_synth_stream_read() is a callback invoked from the MPF context,
while your engine may produce speech from its own context. Check how
this is handled in flite plugin using mpf_buffer.
>And what should I do with these frames?
Typically nothing.
>Should they be written to the generated pcm file (as in the demo)?
No!
>And which function is responsible for that?
No such function.
On Sun, Aug 1, 2010 at 9:41 PM, Iman Saleh <iman.sa...@gmail.com> wrote:
> Hi Arsen,
>
> I just want to make sure of few things. I checked flite plugin, the method
> flite_speak is the one responsible for synthesis. What I understand is that
> I should write a similar function that splits text using some criteria. And
> then in a loop I should perform synthesis and write result to
> synth_channel->audio_buffer each time.
It depends on the capabilities of your engine. Some engines accept
SSML content, the others are capable to process only plain text.
>
> My questions are: is it possible to change the type of
> synth_channel->audio_buffer to suit the output of my TTS engine?
Your goal is to provide a synthesized media frame from the callback.
The callback is invoked every 10 msec, while engines usually produce
synthesized speech at faster than real-time rate. You may or mayn't
use mpf_audio_buffer. Again, it's up to you and the engine.
>
> Also I still cannot follow up what is going on in flite_synth_stream_read. I
> understand it receives an empty frame as input and writes to it the content
> of synth_channel->audio_buffer at a point of time, but I still don't know
> when is it called exactly? And how can I use the filled frame in streaming?
This is a typical callback approach. It's called from the MPF core.
The RTP streaming is handled inside the stack. You should do nothing,
just provide synthesized frames!
Hi Arsen,
Now I need to play generated speech. I have read that RTSP protocol should implement a method play and that method should be responsible for playing streamed data.
How can I call it in unimrcp? or how can I allow client to play streamed media? I am using Voxeo as a client for a unimrcp server.
Hi Arsen,
OK I thing the problem has something to do with sampling rate. The speech generated has a sampling rate of 22050, I can see that the supported sampling rates in unimrcp are 8000, 16000, 32000 and 48000.
I think I should determine the sampling rate using something as follows (similar to flite plugin):
mpf_codec_capabilities_add(
&capabilities->codecs,
MPF_SAMPLE_RATE_8000 | MPF_SAMPLE_RATE_16000,
"LPCM");
Do I have to change the sampling rate of the generated speech or can I add a new sampling rate?
and why are the two values MPF_SAMPLE_RATE_8000, MPF_SAMPLE_RATE_16000 ORed?