12/12/2025 | Press release | Distributed by Public on 12/12/2025 11:30
Earlier this week, we introduced greater control over audio generation with an upgrade to our Gemini 2.5 Pro and Flash Text-to-Speech models.
But generating expressive speech is only one side of the conversation. Today, we're releasing an updated Gemini 2.5 Flash Native Audio for live voice agents. This update improves the model's ability to handle complex workflows, navigate user instructions, and hold natural conversations.
Gemini 2.5 Flash Native Audio is now available across Google products including Google AI Studio, Vertex AI, and has also started rolling out in Gemini Live and Search Live, bringing the naturalness of native audio to Search Live for the first time. This means you can more effectively brainstorm live with Gemini, get real-time help in Search Live, or build the next generation of enterprise-ready customer service agents.
Beyond powering helpful agents, native audio unlocks new possibilities for global communication. We're introducing live speech translation, a capability that enables streaming speech-to-speech translation for headphones. It preserves the speaker's intonation, pacing and pitch. This beta experience is rolling out in the Google Translate app starting today.
To enable the breadth of use cases across surfaces and products, we have improved Gemini 2.5 Native Audio in three key areas:
The updated Gemini 2.5 Flash Native Audio's performance against previous versions and industry competitors on ComplexFuncBench
Google Cloud customers are already using Gemini's native audio capabilities to drive real business results, from mortgage processing to customer calls.
Gemini now natively supports new live speech-to-speech translation capabilities designed to handle both continuous listening and two-way conversation.
With continuous listening, Gemini automatically translates speech in multiple languages into a single target language. This allows you to put headphones in and hear the world around you in your language.
For two-way conversation, Gemini's live speech translation handles translation between two languages in real-time, automatically switching the output language based on who is speaking. For example, if you speak English and want to chat with a Hindi speaker, you'll hear English translations in real-time in your headphones, while your phone broadcasts Hindi when you're done speaking.
Gemini's live speech translation has a number of key capabilities that help in the real world:
Starting today, you can try it in a new beta experience in the Google Translate app for real-time translation in your headphones by connecting them to your device and tapping "Live translate." This experience is rolling out to all Android devices in the US, Mexico and India with support for iOS and more regions coming soon.
Based on feedback, we will continue to iterate on this experience and bring it to more Google products like the Gemini API in 2026.
Start building voice agents today with Gemini 2.5 Flash Native Audio, now generally available on Vertex AI and as preview in the Gemini API. Read our developer docs or try it directly in Google AI Studio.
Gemini 2.5 Flash and 2.5 Pro text-to-speech models are also available via the Gemini API in Google AI Studio. Get started with the speech generation docs, explore the prompting guide, or check out the Gemini API Cookbook to get started.