Hi,
I’m seeking for a Google-provided, production-ready voice module for my project, similar in function to OpenAI’s Whisper, but with a specific focus on real-time, speech-to-speech conversation.
Does Google (through Google Cloud, Gemini, or Vertex AI) offer a high-performance API or SDK that I can purchase and integrate?
The ideal service would need to:
Transcribe incoming audio from a user in real-time.
Process that transcription (e.g., send it to an AI for a response).
Synthesize the text response back into natural-sounding speech.
Deliver this synthesized voice back to the user with minimal latency to enable a fluid, real-time conversation.
In my research, I found a model named gemini-2.5-flash-native-audio-preview-09-2025. This seems promising, but I have a few specific questions:
Is this the correct model for my real-time speech-to-speech use case?
What is the status of this model? The “preview” tag suggests it might not be stable or ready for a live, production-level project. Can you confirm if this is project-ready?
If this is the right choice, what is the recommended integration path or SDK for building a full-duplex voice assistant around this model?
Regards,
Mehedi