Hi,
I trying to find out which UniMRCP components/licences I need for a speech translation solution on Freeswitch. I am keen to understand this from a developer perspective (because I am one), but I am also tasked with finding out what the costs would be.
Here is a more detailed explanation of what I am trying to do…
I want to implement a two-way “real-time” speech translation feature on Freeswitch, using AWS Transcribe, AWS Translate and AWS Polly.
The plan is to use Freeswtich as a back-to-back agent, taking RTP from one channel, passing it to AWS Transcribe, taking that output and piping it into AWS Translate, then, using AWS Polly, playing the translated text, as TTS, to the opposite channel (I want to do this in both direction).
I have the AWS elements working in a simple test app (just using micrphone input and headphone output), but I am unclear on how to get audio (RTP) out from one Freeswitch channel and then play TTS back into the opposite channel. From what I’ve read, MRCP is the recommended mechanism to stream audio out and back into Freeswitch. The problem is, I don’t really understand how I would use to this to achieve my objective and which licensable component(s) I would need.
I have now previous experience or understanding of MRCP, but I am thinking of something like this (showing just one direction)….
Freeswitch Channel A -> MRCP -> ? -> AWS Transcribe -> ? -> AWS Translate -> ? -> AWS Polly -> ? MRCP -> Freeswitch Channel B
BTW: My preferred language is Python and that’s what most of my codebase is written in
I would be grateful for any advice/guidance on what I need I how I would set it up, would be most welcome. Thank you.
Andy