Deepgram

Realtime voice AI API platform

A developer speech AI API platform for realtime STT, TTS, and voice agents through Nova, Flux, Aura, and the Voice Agent API.

Korean I/OAPI availableCommercial use OK

Visit site + Add to compare

Edge

vs. similar tools: It focuses on realtime voice-agent infrastructure, including turn detection and interruption handling, beyond STT and TTS.

Overview

At a glance

Offers Nova-3 STT and Flux realtime conversational recognition
Designed around 50+ languages and low latency
Official Series C and 1,300-organization signals are public
Requires more API integration work than a finished transcription app
Advanced voice-agent usage needs cost modeling
Best for: Developer teams building realtime voice agents and high-scale transcription APIs

Deepgram is a speech AI API platform for developer teams adding realtime recognition and voice agents to products. Nova-3 handles high-accuracy transcription, Flux focuses on realtime conversational recognition and turn detection, and Aura covers speech generation. It is especially strong for embedding live conversational voice experiences into products, not just batch file transcription.

Its strength is latency and scale. Deepgram pricing starts with a $200 free credit and usage-based billing, with Nova-3 published from $0.0048 per minute and Flux from $0.0065 per minute. Deepgram says Nova-3 supports more than 50 languages, and its 2026 announcement reported a $130 million Series C, a $1.3 billion valuation, and more than 1,300 organizations building with its APIs. Cloud APIs and self-hosted or on-premises options are both part of its positioning.

The tradeoff is operational complexity. Because it is an API embedded into a product, teams need to design audio capture, streaming connections, error handling, and cost monitoring. Voice Agent API and advanced capabilities are billed by connection time or add-on usage, so contact-center and high-volume products should simulate usage before rollout. For a ready-made meeting-notes workflow, Otter.ai is simpler.

Deepgram is a strong fit for product teams that need low latency and high-volume realtime voice infrastructure. AssemblyAI can be easier to evaluate when the focus is file transcription, summaries, and speech-understanding APIs. If the goal is no-setup meeting notes and searchable conversations for a team, compare Otter.ai as well.

Pricing

Plan	Monthly price	Limits
Pay As You Go	200 credits	$200 free credit, then Nova-3 starts at $0.0048 per minute
Growth	-	Annual prepaid credits, public model endpoints, and higher concurrency
Custom	-	Custom contract for custom models, enterprise support, and deployment needs

Specs

Languages: 50
Real-time: Supported
API: Yes
Open source: No
Self-hosting: Available
Korean support: Input/output only
Commercial use: Allowed

Popularity

Buzz and recognition on absolute thresholds

Absolute-threshold score

High confidence4/4 signals

Each axis maps to a 1-10 absolute threshold where 10 means broadly recognizable. Collected: 2026-06-16.

Verified public benchmark: $1.3B valuation and 1,300+ organizations reported by Deepgram (as of 2026-01-13) Source

By popularity

ElevenLabs
Its expressive, multilingual voice quality and rich API and ecosystem have made it an industry standard.
Otter.ai
It delivers realtime transcription and meeting summaries as a finished app, then extends into connected-app search and follow-up workflows.
HeyGen
It leads in avatar presenter videos and multilingual dubbing quality.
Fireflies.ai
With 100+ transcription languages, AskFred, CRM/work-app integrations, and APIs, it is built to turn meeting notes into automated workflows.
AssemblyAI
It goes beyond transcription by packaging natural-language prompting, keyterm boosts, medical mode, and voice agent APIs in one platform.

Compare Deepgram

Deepgram vs AssemblyAI Deepgram vs ElevenLabs Deepgram vs Fish Audio Deepgram vs Kokoro TTS Deepgram vs Murf AI

Last updated: 2026-06-16

All tools