Re: [UniMRCP] Integration between FreeSWITCH, UniMRCP, and Java service for ASR with grammar selection

mayamatakeshi

unread,

Mar 21, 2024, 7:55:06 PMMar 21

to uni...@googlegroups.com

On Fri, Mar 22, 2024 at 4:19 AM Daniel Oliveira <odsda...@gmail.com> wrote:

Hello everyone,

I'm working on a project for automatic speech recognition (ASR), and I'm facing a challenge in integrating FreeSWITCH, UniMRCP, and my Java service. My goal is as follows:

Data Flow:

FreeSWITCH -> UniMRCP -> My Java Service (for receiving audio packets).
Processing:

My Java service transcribes the audio packets into text.
A language model (LLM) is applied to determine the appropriate response based on the transcription. This model selects a suitable grammar for the response.
Return to FreeSWITCH:

The response, including the grammar selected by the LLM, should be sent back to FreeSWITCH to finalize the ASR interaction.
I'm trying to understand how I can implement the component that receives the audio packets from UniMRCP in my Java service. I've seen suggestions about creating a plugin, but I'm struggling to find examples or clear guidance on how to do this.

Additionally, I have questions about integration with third-party transcription services like Azure Speech Recognition. They offer webhooks to send transcriptions, but it's not clear to me how I can return the grammar selected by the LLM and send it back to FreeSWITCH to continue the ASR process.

Has anyone worked on a similar project or have suggestions on how I can approach this integration effectively?

From what I gather, your java service would be some sort of triage system and would not be doing the actual interaction with the caller (I assume this is some sort of IVR not just ASR).

Instead, it would determine intent and maybe language/sentiment and hand over the control to the actual IVR system.

If this assumption is correct, I think you could use this module:

https://github.com/amigniter/mod_audio_stream

This would send audio to your java service via WebSocket and your service would reply with the initial transcription and grammar that then would be used to start the actual interaction with speech services behind unimrcp.

So you would not need to implement a plugin for unimrcp.

Daniel Oliveira

unread,

Mar 22, 2024, 9:37:13 AMMar 22

to UniMRCP

Hello,

Thank you for your response and the suggestion of the mod_audio_stream module. I'd like to explain the project flow in more detail to ensure we're on the same page.

The goal of the project is to implement an Interactive Voice Response (IVR) system where customers call a number and are directed to the IVR. The IVR is based on FreeSWITCH and then forwards the call to my Java service.

In my Java service, I'd like to capture the customer's audio in real-time as they describe the issue with a device purchased from my store. Based on the transcription of this audio, the system needs to make a decision on which action to take. This could involve transferring the call to a remote technician for remote problem resolution, or scheduling a visit with an on-site technician.

What I'm looking for is a solution to receive these audio packets in real-time in my Java service, along with the grammars associated with the available options so that I can process the transcription and make the appropriate decision. I mention the grammars because I believe they are necessary for the flow.

Based on this flow, do you think the mod_audio_stream module would be suitable for this purpose? Or do you have any other suggestions on how to implement this integration?

If you are available for freelance work with this type of integration, don't hesitate to contact me. We are looking to hire someone who can help with this part of the project.

Thank you again for your assistance, and I look forward to your guidance.

Best regards,
Daniel.

mayamatakeshi

unread,

Mar 25, 2024, 6:35:52 PMMar 25

to uni...@googlegroups.com

On Fri, Mar 22, 2024 at 10:37 PM Daniel Oliveira <odsda...@gmail.com> wrote:

Hello,

Thank you for your response and the suggestion of the mod_audio_stream module. I'd like to explain the project flow in more detail to ensure we're on the same page.

The goal of the project is to implement an Interactive Voice Response (IVR) system where customers call a number and are directed to the IVR. The IVR is based on FreeSWITCH and then forwards the call to my Java service.

In my Java service, I'd like to capture the customer's audio in real-time as they describe the issue with a device purchased from my store. Based on the transcription of this audio, the system needs to make a decision on which action to take. This could involve transferring the call to a remote technician for remote problem resolution, or scheduling a visit with an on-site technician.

What I'm looking for is a solution to receive these audio packets in real-time in my Java service, along with the grammars associated with the available options so that I can process the transcription and make the appropriate decision. I mention the grammars because I believe they are necessary for the flow.

OK. I think grammar is not relevant here.

I think you just need a reference to the chat session so that for example if you transfer the call but the transfer fails, it would go back to the same chat session.

Based on this flow, do you think the mod_audio_stream module would be suitable for this purpose? Or do you have any other suggestions on how to implement this integration?

I think yes, mod_audio_stream would be usable for this and would save you some time.

Your service just needs to process the audio and send back some messages to control freeswitch:

A basic interface with a module/script at freeswitch could be like this:

- {type: "synth-speech", text: "Hi, how can I help you?"}

- {type: "stop-speech-synth"} (when your service detected the user started to speak. This is necessary to stop any speech that we might be generating if the user starts to talk mid-prompt)

- {type: "transfer", "destination": "SOME_DESTINATION", "chat_ref": "SOME_CHAT_ID}}

In case of transfer, the chat session to your service would end.

Then if the transfer fails, the chat_ref would be informed when the websocket connection to your service is recreated so that the chat can continue from where it stopped.

If you are available for freelance work with this type of integration, don't hesitate to contact me. We are looking to hire someone who can help with this part of the project.

I actually never used mod_audio_stream.

Instead I've been using mod_audio_fork but I am planning to switch to mod_audio_stream because it is simpler to build (every time I change freeswitch version I need to review the build of mod_audio_fork).

I've been doing research about it here:

https://github.com/MayamaTakeshi/mod_audio_stream_tests

Eventually, there will be a prototype there that you could use as basis to implement your solution.

Anyway this is out of unimrcp scope so unless you decide to implement a unimrcp plugin for your service (which would also be a valid alternative), it is better to use other forums to ask for help.

Message has been deleted

Daniel Oliveira

unread,

Mar 26, 2024, 9:20:12 AMMar 26

to uni...@googlegroups.com

Hello,

Thanks for the feedback and the suggestion of the mod_audio_stream module. I'd like to share more details about the flow we're looking to implement to ensure we're on the same page.

I would like to share a diagram illustrating the desired flow of the system:

In this diagram, we can see the flow of the client's interaction with the IVR system, with a focus on the interaction with our Java service for processing. Let me explain each step:

Customer calls the number and interacts with the FreeSWITCH IVR:

The client initiates the connection and interaction with the FreeSWITCH IVR. In the IVR example above, it calls an endpoint for our internal control, then it will call the part of the ASR that is my Java project.
Audio Data Packet Sent to Java Service:

The FreeSWITCH IVR sends the audio data packet, containing the client's speech, to our Java service for processing. This step is crucial as our Java application is responsible for converting speech to text and performing further processing.
Processing in Java Service:

Our Java service receives the audio data packet and processes it as needed. This involves converting speech to text and analyzing the content to identify the problem reported by the customer.
Decision Making and Response:

Based on the analysis of the speech content, our Java service determines the action to be taken. This may involve selecting an option for further interaction or initiating a specific process to resolve the customer's issue.
Response sent to FreeSWITCH:

Once the decision is made, our Java service sends the response, placing the selected option or action, back to the FreeSWITCH IVR. This response guides the IVR on how to proceed with the customer interaction.
I hope this diagram helps clarify the role of our Java service in processing customer interactions and how it integrates with the FreeSWITCH IVR. Committed to ensuring smooth integration of our service with FreeSWITCH.

Please feel free to get in touch if you have any further questions or need further clarification.

Yours sincerely,
Daniel.

--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/CABaNFCbiOpDgY83FRjUDKaLEaOiqtp3NHw1a27uURuF7RhFdpQ%40mail.gmail.com.

--

att,

Daniel Oliveira da Silva

(11)95205-5204

Reply all

Reply to author

Forward