The State of Voice AI Instruction Following in 2026 (www.coval.dev)

🤖 AI Summary
In a recent discussion on the state of voice AI, Kwindla Hultman Kramer and Zach Koch addressed key challenges and advancements in the field. Kramer announced the introduction of a much-needed public benchmark for instruction following in voice AI, designed to evaluate models through long, multi-turn conversations that reflect real-world scenarios. This benchmark revealed that cutting-edge models like GPT-5 and Gemini 3, while intelligent, struggle with latency, making them impractical for production use. As a result, many voice agents continue to rely on older models, primarily due to the complexities and costs associated with switching and evaluating new systems. The conversation highlighted the persistent difficulties in benchmarking voice AI, particularly around the unique requirements of instruction following in extended dialogues. Areas like back-channeling and prosody matching remain inadequately evaluated, hindering the development of more natural interactions. Additionally, the trend toward multi-model architectures introduces new challenges in maintaining coherence and user experience. Both experts emphasized the necessity of communication between developers and users to bridge the gap between model performance and real-world application, noting that user feedback is crucial for advancing voice AI technology.
Loading comments...
loading comments...