Polly issues with vendor-specific parameters

22 views
Skip to first unread message

RauL Baldeon

unread,
Mar 6, 2024, 8:12:26 PMMar 6
to UniMRCP

Hello everyone,

I hope you're all doing well.

I'm encountering an issue with my MRCP Server that involves the use of Polly profiles. Specifically, I have two profiles of Polly, one requiring a standard voice and the other needing a neural voice.

As per the documentation, it's possible to overwrite the value specified in the umspolly.xml file through a request. This capability has been available since version 1.11.0. All vendor-specific parameters can also be specified via query parameters of the xml:base attribute in SSML. Below is an example demonstrating the use of SSML content with the vendor-specific parameter voice-engine set to neural:

<speak version='1.0' xml:lang='es-US' xml:base='http://localhost/settings?voice-engine=neural' xmlns='http://www.w3.org/2001/10/synthesis'>
    <p>Su reserva ha sido confirmada. Gracias por confiar en nosotros.</p>
</speak>

However, despite specifying the neural voice in the request header, the MRCP Server is still sending the request to Amazon Polly with the standard voice. Here's an excerpt from the logs illustrating the issue:

2024-03-07 00:17:18:402173 [INFO]   Receive MRCPv2 Data 1.1.1.1:1544 <-> 2.2.2.2:35092 [436 bytes]
MRCP/2.0 436 SPEAK 1
Channel-Identifier: 0323331144da4959@speechsynth
Content-Type: application/ssml+xml
Voice-Gender: female
Voice-Variant: Neural
Speech-Language: es-US
Voice-Name: Lupe
Content-Length: 218

<speak version='1.0' xml:lang='es-US' xml:base='http://localhost/settings?voice-engine=neural' xmlns='http://www.w3.org/2001/10/synthesis'> <p>Su reserva ha sido confirmada. Gracias por confiar en nosotros.</p></speak>
2024-03-07 00:17:18:402203 [INFO]   Assign Control Channel <0323331144da4959@speechsynth> to Connection 1.1.1.1:1544 <-> 2.2.2.2:35092 [0] -> [1]
2024-03-07 00:17:18:402237 [INFO]   Process SPEAK Request <0323331144da4959@speechsynth> [1]
2024-03-07 00:17:18:402378 [NOTICE] Read AWS Credentials /opt/unimrcp/data/aws.credentials.cliente1
2024-03-07 00:17:18:402413 [INFO]   Create Polly Client: thread pool [1] region []
2024-03-07 00:17:18:402461 [INFO]   Start Async Synth encoding [4] sampling-rate [8000] language [US Spanish] voice [Lupe] engine [standard] <0323331144da4959@polly>
<speak version="1.0" xml:lang="es-US"> <p>Su reserva ha sido confirmada. Gracias por confiar en nosotros.</p></speak>
2024-03-07 00:17:18:510002 [INFO]   Synthesis Complete: characters [63] size [72868 bytes] <0323331144da4959@polly>
2024-03-07 00:17:18:510075 [INFO]   Process SPEAK Response <0323331144da4959@speechsynth> [1]
2024-03-07 00:17:18:510084 [NOTICE] State Transition IDLE -> SPEAKING <0323331144da4959@speechsynth>


It seems that despite the specified configuration, the server is not utilizing the neural voice as expected.

Any insights or assistance on resolving this matter would be greatly appreciated.

Thank you all for your help.

Best regards,

Raúl

Reply all
Reply to author
Forward
0 new messages