noodls browser compatibility check

The security settings of your browser are blocking the execution of scripts.

To use noodls, javascript support must be enabled. Please change your browser's security settings to enable javascript.

If you have changed your browser's security settings, you can click here.

related announcements

News

UC Davis Health System

Do low thyroid hormone levels contribute to heart dysfunction
TikTok Inc.

TikTok Celebrates Olivia Rodrigo's New Single with "drop dead" Karaoke
Loyola Marymount University

LMU College of Business Administration Celebrates 100th Anniversary[...]

International News

Google LLC

04/15/2026 | Press release | Distributed by Public on 04/15/2026 10:41

Gemini 3.1 Flash TTS: the next generation of expressive AI speech

Your browser does not support the audio element.

Listen to article

This content is generated by Google AI. Generative AI is experimental

[[duration]] minutes

Voice Speed

Voice

Speed 0.75X 1X 1.5X 2X

Today, we're introducing Gemini 3.1 Flash TTS, the latest text-to-speech model that delivers improved controllability, expressivity and quality - empowering developers, enterprises and everyday users to build the next generation of AI-speech applications.

Starting today, 3.1 Flash TTS is rolling out:

For developers in preview via the Gemini API and Google AI Studio
For enterprises in preview on Vertex AI
For Workspace users via Google Vids

Improved speech quality and controllability

We've improved the overall speech quality of Gemini 3.1 Flash TTS, making it our most natural and expressive model to date. On the Artificial Analysis TTS leaderboard, a benchmark that captures thousands of blind human preferences, 3.1 Flash TTS achieved an impressive Elo score of 1,211.

Artificial Analysis has also positioned Gemini 3.1 Flash TTS within its "most attractive quadrant" for its ideal blend of high-quality speech generation and low cost. The model stands out further with native multi-speaker dialogue, support for 70+ languages, and granular creative control via natural language.

New audio tags for more expressive speech generation

3.1 Flash TTS also introduces audio tags - an intuitive way to control vocal style, pace and delivery. By embedding natural language commands directly into the text input, you can steer AI-speech output with improved levels of granularity.

You can start experimenting with these audio tags along with other updates to the developer experience in Google AI Studio with configurable controls that place the developer in the "director's chair":

Scene direction: Set the stage by defining the environment and providing specific dialogue instructions. This world-building context helps characters remain "in-character" and react to one another naturally across multiple turns.
Speaker-level specificity: Cast characters using unique Audio Profiles, then specify Director's Notes to toggle pace, tone and accent. Using inline tags, speakers can pivot from these high-level settings to change expression mid-sentence.
Seamless export: Once the performance is perfected, these exact parameters can be exported as Gemini API code to ensure consistent, recognizable voices across various projects and platforms.

With these new configurations, developers can enhance precision for specific scenarios, creating memorable characters and immersive audio experiences.

Get started with high-fidelity speech generation in the Google AI Studio Playground.

Built for global scale

Gemini 3.1 Flash TTS delivers high-fidelity speech and more precise control across more than 70 languages. These core optimizations bring advanced style, pacing and accent control to major markets - helping developers create localized, expressive speech experiences for users at global scale.

Early developer and enterprise testers are already seeing the impact of 3.1 Flash TTS, highlighting its impressive controllability and expressivity. They've told us how audio tags provide a new level of creative precision, transforming simple text into a high-fidelity vocal performance.

Watermarked with SynthID

All audio generated by Gemini 3.1 Flash TTS is watermarked with SynthID. This imperceptible watermark is interwoven directly into the audio output, allowing the reliable detection of AI-generated content to help prevent misinformation.

Get more stories from Google in your inbox. Get more stories from Google in your inbox.

Email address

Your information will be used in accordance with Google's privacy policy.

Done. Just one step more.

Check your inbox to confirm your subscription.

You are already subscribed to our newsletter.

You can also subscribe with a different email address .

POSTED IN:

Google LLC published this content on April 15, 2026, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on April 15, 2026 at 16:41 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]

Back

View original format