Deepgram, a voice AI platform for enterprise use cases, announced the general availability of its Voice Agent API, a single, unified voice-to-voice interface that gives developers full control to build context-aware voice agents that power natural, responsive conversations. Combining speech-to-text, text-to-speech, and large language model (LLM) orchestration with contextualized conversational logic into a unified architecture, the Voice Agent API gives developers the choice of using Deepgram’s fully integrated stack (leveraging Nova-3 STT and Aura-2 TTS models) or bringing their own LLM and TTS models. It delivers simplicity for developers and the controllability enterprises need to deploy voice agents.
Deepgram’s Voice Agent API provides a single, unified API that integrates speech-to-text, LLM reasoning, and text-to-speech with built-in support for real-time conversational dynamics. Capabilities such as barge-in handling and turn-taking prediction are model-driven and managed natively within the platform.
While the Voice Agent API streamlines development, it also gives teams control over performance, behavior, and scalability in production. Built on Deepgram’s Enterprise Runtime and model ownership across the voice AI stack, the platform enables model-level optimization at every layer of the interaction loop for precise tuning of latency, barge-in handling, turn-taking, and domain-specific behavior in ways not possible with disconnected components.
https://deepgram.com/learn/voice-agent-api-generally-available