I have a chatbot in dialogflow, it is conected with whatsapp by landbot, in this moment when an user send me an audio, landbot send the audio to dialogflow(literal in the format .ogv) but I need to transcrib it, because the dialogflow must understand. For example, this is the intent:
{
"id": "ffdc721f-8e94-4a9e-9ec8-a45b9b7f21be-37284719",
"fulfillmentText": "š Pronunciaste mal. \n La palabra era: Dog",
"language_code": "en",
"queryText": "
https://media.eu-1.smooch.io/apps/5d2370ef6667cd00102fb9c2/conversations/31f2d6d5440d03fde4066b35/hJ9ob-c2Ogho4NTDLEuyFMg_/5zKf3-SgGMuS3RxcBUw5B6dj.oga",
"webhookPayload": {},
"intentDetectionConfidence": 0.3,
"action": "",
"webhookSource": "",
"parameters": {
"pronunciacion": "
https://media.eu-1.smooch.io/apps/5d2370ef6667cd00102fb9c2/conversations/31f2d6d5440d03fde4066b35/hJ9ob-c2Ogho4NTDLEuyFMg_/5zKf3-SgGMuS3RxcBUw5B6dj.oga",
"palabra": "Dog"
},
"fulfillmentMessages": [
{
"text": {
"text": [
"š Pronunciaste mal. \n La palabra era: Dog"
]
}
}
],
"diagnosticInfo": {
"webhook_latency_ms": "1871.0"
},
"webhookStatus": {
"webhookStatus": {
"message": "Webhook execution successful"
},
"webhookUsed": true
},
"intent": {
"isFallback": false,
"displayName": "Pronunciar",
"id": "4dd12af2-94a6-486b-a1cd-daa2c65d6671"
}
}
so in the part of "pronunciacion" and "palabra" must be the same, but when I will convert the audio to text, and I don't know how to do that, could help me?