I'll give you some suggestions on how to think about this. This is how I think about these systems, but others may have different views.
The question I often ask is "what answers the telephone call?". Unimrcp does not answer phone calls. It lets systems that do answer phone calls communicate with media services like speech recognition and speech synthesis.
Phone calls can arrive using different technologies or protocols, but for your purposes a Voice-over-IP call using SIP and RTP is most likely.
In larger enterprises we often use systems that are dedicated for answering phone calls and running IVR applications. One class of these systems are called VXML voice browsers. They are made by companies like Avaya, Genesys, Cisco and others. As an example, here are some docs from Genesys -
https://docs.genesys.com/Documentation/GVP
Here is an example of how these can work together:
Telecom/VOIP -----> FreeSwitch -----> Unimrcp ------> ASR server
SIP/RTP MRCP ASR protocol
A phone call arrives using SIP/RTP protocol. It is answered by FreeSwitch.
FreeSwitch uses Mod_unimrcp to communicate with Unimrcp server. (FreeSwitch is the MRCP client)
Unimrcp has a plug in to support your speech recognizer of choice . That plugin uses the ASR server to perform recognition.
There are many ways to build a solution like the one you've described. I hope this helps.
- Michael