Image: AI voice support and call center operations

AI Voice6 min read4 April 2026

VAPI Pricing Explained: How Every Component of Your Bill Is Actually Calculated

A transparent breakdown of VAPI's per-minute billing model - platform fee, LLM tokens, TTS audio, STT processing, and telephony - so you can predict costs before you scale.

HAROON MOHAMED

Automation, CRM, and full-stack systems

Author

Verification note: This post was re-reviewed in May 2026. Public tool pricing, compliance rules, and platform capabilities should be checked against the source list at the end before making budget, legal, or deployment decisions. Private client metrics are not published unless they are safe, public, and verifiable.

VAPI bills per-minute - but that's not the whole story

When you first deploy a VAPI agent, you look at the pricing page and see "$0.05/min platform fee." You think: at 1,000 minutes/month, that's $50. Easy.

Then your first invoice arrives and it's $600.

The reason: VAPI's platform fee is one of five cost components that make up every minute of call time. The other four are pass-through costs that VAPI doesn't set - but still shows up on your bill.

Here's each component, what it actually costs at current public rates, and where you have control over the number.

Component 1: VAPI platform fee

Current rate (per VAPI's public pricing, 2026): $0.05/minute

This is VAPI's own margin. It covers the infrastructure that orchestrates your call - routing audio between the LLM, STT, TTS, and telephony provider.

Control: None. This is baked in.

Optimization: Long-term, this is the one cost you can't drive down. Everything else is tunable.

Component 2: LLM inference

Current rates (OpenAI, April 2026):

GPT-4o: $2.50 / 1M input tokens, $10.00 / 1M output tokens
GPT-4o Mini: $0.15 / 1M input tokens, $0.60 / 1M output tokens

Current rates (Anthropic):

Claude Sonnet 4.5: $3 / 1M input, $15 / 1M output
Claude Haiku 4.5: $1 / 1M input, $5 / 1M output

How it adds up: Every turn of conversation sends the full conversation history plus system prompt to the LLM. A 4-minute call typically consists of 15-25 turns. If your system prompt is 1,500 tokens and each turn adds ~200 tokens of history, the cumulative input tokens for the call can reach 30,000-60,000.

Example calculation:

4-minute call with GPT-4o
50,000 input tokens + 3,000 output tokens across the conversation
Cost: (50,000 x $2.50/1M) + (3,000 x $10/1M) = $0.125 + $0.030 = $0.155/call
Per minute: $0.04/min

Same call with GPT-4o Mini: ~$0.003/min - a 90% reduction.

Control: Full. You choose the model in your VAPI agent config.

Component 3: Text-to-Speech (TTS)

Current rates (April 2026, approximate):

ElevenLabs: $0.18 per 1,000 characters (flagship tier)
Cartesia Sonic: $0.025 per 1,000 characters
Rime AI: $0.04 per 1,000 characters
Azure Neural: $0.016 per 1,000 characters
PlayHT: $0.05 per 1,000 characters

How it adds up: TTS is billed per character of output audio, not per minute of call. A 4-minute call where the AI speaks for 90 seconds typically outputs 1,500-2,500 characters of text.

Example calculation (2,000 characters):

ElevenLabs: 2,000 x $0.00018 = $0.36/call = $0.09/min
Cartesia: 2,000 x $0.000025 = $0.05/call = $0.0125/min
Azure: 2,000 x $0.000016 = $0.032/call = $0.008/min

That's a 10x range between the cheapest and most expensive provider for the exact same call length.

Control: Full. You select the TTS provider in VAPI config.

Component 4: Speech-to-Text (STT)

Current rates (April 2026):

Deepgram Nova-3: $0.0043/minute (streaming)
AssemblyAI Universal: $0.0037/minute
OpenAI Whisper (via API): $0.006/minute
Google Cloud STT: $0.016/minute (enhanced model)

Control: Full. VAPI supports all major providers.

Note: STT is the smallest cost component for most calls. The difference between the cheapest and most expensive is usually only $0.01-$0.02 per minute. Not worth obsessing over.

Component 5: Telephony

Current rates (April 2026):

Twilio US phone number: $1.15/month rental
Twilio US outbound call: $0.014/minute + per-segment for SMS
Twilio Toll-free: $2.00/month rental + $0.019/min outbound
VAPI native telephony (if used): $0.03/min (higher, but simpler)

Additional costs most people miss:

A2P 10DLC brand registration (Twilio): $4 one-time + $10/month per campaign
SHAKEN/STIR registration: included in most Twilio plans but required for volume
CNAM display (your business name on caller ID): $5-$15/month per number

Control: Partial. You choose the provider, but the per-minute rate is set by carriers.

Putting it all together

A 4-minute call with an optimized stack in 2026:

| Component | Cost | |-----------|------| | VAPI platform | $0.20 | | LLM (GPT-4o Mini) | $0.012 | | TTS (Cartesia) | $0.05 | | STT (Deepgram) | $0.017 | | Telephony (Twilio) | $0.056 | | Total | $0.335/call | | Per minute | $0.084/min |

Same call with default/premium settings:

| Component | Cost | |-----------|------| | VAPI platform | $0.20 | | LLM (GPT-4o) | $0.16 | | TTS (ElevenLabs) | $0.36 | | STT (Deepgram) | $0.017 | | Telephony (Twilio) | $0.056 | | Total | $0.793/call | | Per minute | $0.198/min |

At 10,000 minutes/month, that's $840 vs $1,980 - a $1,140/month difference for the same functional output.

The pricing-page trap

When evaluating VAPI vs. competitors (Retell, Bland, Vocode), don't compare platform fees. Compare the full stack cost at your usage volume with your chosen providers. A platform with a $0.07/min fee but better default pricing on LLM/TTS can easily be cheaper overall than one with a $0.04/min fee and expensive defaults.

Always build a spreadsheet with your actual config before choosing a platform.

Where to verify these numbers

All rates above are pulled from each provider's public pricing page as of April 2026. Verify them yourself:

VAPI: vapi.ai/pricing
OpenAI: openai.com/api/pricing
Anthropic: anthropic.com/pricing
Cartesia: cartesia.ai/pricing
ElevenLabs: elevenlabs.io/pricing
Deepgram: deepgram.com/pricing
Twilio: twilio.com/pricing

Prices change frequently. Check these pages before building a business case.

Want help modeling the full cost stack for your specific use case? Get in touch - I've done this spreadsheet exercise for 13+ client deployments.

Sources and verification

This article was reviewed in May 2026. Vendor pricing, platform features, ad policies, and telemarketing rules change often, so operational or budget decisions should be checked against the current source pages below before implementation.

Private client metrics, lead counts, appointment counts, cost reductions, and revenue examples are intentionally removed, softened, or framed as modeled examples unless they can be verified publicly without exposing client data.

Need this built?

Turn this reading into a scoped operating system.

Use the intake to send the business context first, then the build conversation can stay focused on the workflow that needs to change.

Build My System See Proof

AI Voice

Measuring AI Voice Agent Performance: The 7 Metrics That Actually Matter

> Verification note: This post was re-reviewed in May 2026. Public tool pricing, compliance rules, and platform capabiliti...

6 May 2026 / 8 min read

Image: AI voice support and call center operations

AI Voice

Extracting Structured Data from VAPI Call Transcripts

30 Apr 2026

7 min

read

> Verification note: This post was re-reviewed in May 2026. Public tool pricing, compliance rules, and platform capabilities should be checked against the source list...

HMX ZONERead article