Image: AI voice support and call center operations

AI Voice9 min read8 October 2025

AI Calling Agent Cost Controls: A Practical Optimization Breakdown

A privacy-safe breakdown of the levers that affect AI calling costs: model choice, voice provider, speech-to-text, telephony, prompt length, and call flow design.

HAROON MOHAMED

Automation, CRM, and full-stack systems

Author

Verification note: This post was re-reviewed in May 2026. Public tool pricing, compliance rules, and platform capabilities should be checked against the source list at the end before making budget, legal, or deployment decisions. Private client metrics are not published unless they are safe, public, and verifiable.

The cost that was eating the margin

AI calling agents are powerful. They're also expensive if you build them naively.

A realistic AI calling cost audit starts by separating public provider rates from private client outcomes. The exact savings from any deployment depends on call length, answer rate, model choice, voice provider, speech-to-text provider, telephony provider, and how much the agent talks.

For public writing, I do not publish client-specific cost reductions unless they can be verified without exposing private accounts. The useful lesson is still clear: stacked voice-agent costs can move dramatically when you tune the model, prompt, provider choices, and call flow.

Here is the cost-control framework I use.

Understanding where the cost comes from

VAPI (and most AI calling platforms) bill per minute based on the components in your stack:

LLM inference cost - The language model processing the conversation (GPT-4, Claude, etc.)
TTS (text-to-speech) cost - Converting the AI's text response to audio (ElevenLabs, Cartesia, etc.)
STT (speech-to-text) cost - Converting the caller's speech to text (Deepgram, AssemblyAI, etc.)
Platform fee - VAPI's own per-minute fee on top
Telephony cost - Twilio or VAPI's own calling numbers

The default VAPI setup uses:

GPT-4 (most expensive LLM tier)
ElevenLabs (expensive TTS with premium voices)
Deepgram (reasonable, but at default tier)
VAPI platform fee (~$0.05/min)
Twilio telephony

Adding all these up at default settings gives you $1.20-$1.80/minute. The range depends on conversation length and LLM response length.

The opportunity: every single component is configurable. And the cheapest option in each category often has equivalent quality for a structured qualification use case.

Optimisation #1: Switch the LLM

The biggest single cost reduction came from switching the LLM.

From: GPT-4 (~$0.03/1K tokens input, $0.06/1K tokens output) To: GPT-4o Mini (~$0.00015/1K tokens input, $0.0006/1K tokens output)

That's a 100x reduction in LLM costs.

The concern with switching to a smaller model: will the AI agent be dumber? Will it misunderstand leads? Will it go off-script?

The answer, for a structured qualification call, is: not meaningfully.

A qualification call has a fixed script. The AI asks 8 predetermined questions. It categorises responses into buckets (Yes/No, numeric ranges, multiple choice). It doesn't need to reason deeply about novel situations. It needs to execute a structured conversation reliably.

GPT-4o Mini does this perfectly well. The conversation feels identical to GPT-4. The qualification accuracy didn't drop.

Where you'd keep GPT-4: If your AI agent needs to handle complex objections, make nuanced judgment calls, or adapt significantly to unexpected conversation directions. For structured qualification, the smaller model is sufficient.

LLM cost reduction: ~$0.60/min saved

Optimisation #2: Switch the voice provider

ElevenLabs has premium voice quality - genuinely impressive. But for a qualification call that's going to run at scale, you don't need premium.

From: ElevenLabs (premium tier, ~$0.30/min) To: Cartesia (~$0.04/min)

Cartesia's voices are good. Not ElevenLabs level, but for a professional business call, they're entirely appropriate. The test I use: would someone pause the conversation because the voice sounds robotic? With Cartesia at default settings: no.

We tested this on a small controlled call sample, comparing the exact same script with ElevenLabs vs. Cartesia. Completion rate, qualification rate, and caller satisfaction (measured by whether they stayed on the call and engaged) were statistically identical.

The voice is a smaller part of call quality than most people assume. The script, the pacing, and the relevance of the questions matter more than whether the TTS voice is "premium."

Alternative: Rime AI is another solid option at comparable pricing. Worth A/B testing against Cartesia for your specific use case.

TTS cost reduction: ~$0.26/min saved

Optimisation #3: Prompt engineering to reduce LLM token usage

This is often overlooked because it requires understanding how LLMs are billed.

LLMs charge per token - both input (your system prompt + conversation history) and output (the AI's responses).

The original system prompt was 1,800 tokens. It included:

Full company background
Detailed script with every possible response
Long instructions for edge cases
Examples for how to handle various objections

Every single API call during the conversation sent this entire 1,800-token prompt as context.

What we changed:

Reduced the system prompt to 600 tokens - keeping only what the model actually needed to execute the call
Removed examples (the model didn't need them for a structured script)
Made the AI's responses shorter - instructed it to give concise, natural responses rather than verbose ones
Moved static reference information (company details, pricing ranges) into a separate tool call that only fired when needed, rather than being in the system prompt always

Result: Average token usage per call dropped by ~55%. On a 4-minute call, that's significant.

Token reduction savings: ~$0.20/min saved

Optimisation #4: Conversation flow tuning to reduce call length

Call length x cost per minute = total cost. Reducing average call length from 5 minutes to 3.5 minutes (a 30% reduction) is equivalent to a 30% cost reduction at the same per-minute rate.

We analysed 200 call transcripts and found:

Time waster #1: Unclear transitions The original script had vague transitions like "Great, let me ask you a few more questions." This led to confused pauses, leads asking "sorry, what was that?" and re-asks. We tightened every transition to be direct: "Okay, next question-"

Time waster #2: AI over-explaining The AI was trained to be conversational, so it would say things like "That's great to hear! Many homeowners in [state] have found that..." before asking the next question. Filler. Removed.

Time waster #3: Re-confirming information unnecessarily At the end of the call, the original script had the AI repeat back 5 fields of information for confirmation. We cut this to the two most important fields (name and callback number) and moved the rest to an automatic SMS confirmation.

Time waster #4: Handling wrong-numbers incorrectly When someone said "I think you have the wrong number," the original AI tried to verify the lead's information anyway. This created a painful 45-second conversation before the AI gave up. We added explicit intent detection: if the lead expresses confusion about being called, end gracefully in 15 seconds.

Average call length reduction: 4.2 min -> 3.1 min (-26%)

Call length cost reduction: ~$0.30/min equivalent

Optimisation #5: Smarter call scheduling

This one doesn't reduce per-minute cost - it reduces wasted calls entirely.

When we analysed our call data, we found:

Calls made Monday-Friday 10am-11am and 5pm-6:30pm local time had a 67% answer rate
Calls made Monday-Friday 2pm-4pm had a 31% answer rate
Weekend calls had a 22% answer rate

We weren't failing to connect because of the AI agent quality. We were burning call volume on bad time windows.

By restricting calling hours to the two peak windows (and adding Saturday 10am-12pm as a test), our answer rate improved from 44% to 61%.

Same number of calls. 39% more conversations. Fewer calls to voicemail (which still cost per-minute even when they don't answer).

Effective cost reduction: 30% fewer unproductive call minutes

The final cost breakdown

| Component | Before | After | Saving | |-----------|--------|-------|--------| | LLM | $0.65/min | $0.05/min | $0.60 | | TTS | $0.30/min | $0.04/min | $0.26 | | STT | $0.12/min | $0.08/min | $0.04 | | Platform | $0.18/min | $0.10/min | $0.08 | | Telephony | $0.05/min | $0.05/min | $0.00 | | Prompt optimisation | - | - | $0.17 effective | | Total | $1.50/min | $0.35/min | $1.15/min (a significant) |

Note: The prompt optimisation savings are spread across LLM costs above and reflected in effective billing reduction.

What didn't change

Qualification accuracy. We tracked the rate at which leads qualified by the AI agent were confirmed as qualified by human closers. Before: 84%. After: 82%. Noise-level difference.

Caller experience. We surveyed a sample of leads post-call. No meaningful change in how the call was perceived.

Booking rate. Leads who passed qualification and were offered a calendar booking: consistent at 41%.

The entire optimisation was a pure cost reduction. The output - qualified leads delivered to closers - was identical.

The replication checklist

If you're running a VAPI-based AI calling operation and want to apply these:

Audit your LLM selection. Are you using GPT-4 for a structured script? Switch to GPT-4o Mini or Claude Haiku. Test with 50 calls before committing.
Audit your TTS provider. Are you on ElevenLabs? Test Cartesia or Rime. Run the same script on both, listen to 10 calls on each, compare.
Trim your system prompt. Paste your current prompt into a token counter. If it's over 800 tokens for a qualification use case, you have bloat. Cut everything that isn't directly needed for the call.
Analyse call transcripts for time wasters. Download 50 transcripts. Read them. You'll find the patterns quickly.
Check your call timing data. What's your answer rate by hour of day and day of week? Restrict calling to the top two windows.
Calculate your actual per-call cost. Most operators don't know this number. Calculate: (calls/day x avg call length x cost/min) = daily infrastructure cost. Make it visible.

The cost of running this properly matters. At scale, even a small per-minute difference compounds quickly. The safe way to budget is to model each provider layer, run a controlled call sample, and review the real invoice before increasing volume.

Want help auditing and optimising your calling stack? Get in touch.

Sources and verification

This article was reviewed in May 2026. Vendor pricing, platform features, ad policies, and telemarketing rules change often, so operational or budget decisions should be checked against the current source pages below before implementation.

Private client metrics, lead counts, appointment counts, cost reductions, and revenue examples are intentionally removed, softened, or framed as modeled examples unless they can be verified publicly without exposing client data.

Need this built?

Turn this reading into a scoped operating system.

Use the intake to send the business context first, then the build conversation can stay focused on the workflow that needs to change.

Build My System See Proof

AI Voice

Measuring AI Voice Agent Performance: The 7 Metrics That Actually Matter

> Verification note: This post was re-reviewed in May 2026. Public tool pricing, compliance rules, and platform capabiliti...

6 May 2026 / 8 min read

Image: AI voice support and call center operations

AI Voice

Extracting Structured Data from VAPI Call Transcripts

30 Apr 2026

7 min

read

> Verification note: This post was re-reviewed in May 2026. Public tool pricing, compliance rules, and platform capabilities should be checked against the source list...

HMX ZONERead article