Improved Gemini audio models for powerful voice interactions (blog.google)

0 points 206 days ago ago | visit original

🤖 AI Summary

Google recently announced significant updates to its Gemini 2.5 Flash Native Audio model, enhancing its capabilities for live voice interactions. This upgrade focuses on improving how the model handles complex workflows, navigates user instructions, and maintains natural conversations—key features for applications in customer service and real-time assistance. Notably, the model now excels in function calling, with a score of 71.5% on the ComplexFuncBench evaluation, and has improved its adherence rate to developer instructions from 84% to 90%, ensuring higher user satisfaction and seamless multi-turn conversations. In addition to these enhancements, Gemini 2.5 introduces live speech-to-speech translation, allowing users to receive real-time translations in over 70 languages while preserving natural speech characteristics. This capability facilitates effortless multilingual conversations by automatically detecting spoken languages and translating them in real-time, even amidst background noise. The launch of this feature in the Google Translate app marks a significant stride toward more inclusive global communication. Businesses are already leveraging these advancements to enhance user interactions, leading to improved experiences where users often forget they are chatting with AI, thereby laying the groundwork for the next generation of conversational AI technologies.

Loading comments...

loading comments...