Voice AI Phone Receptionist for Businesses
AI receptionist that answers business phone calls 24/7, routes by intent, books appointments, and captures leads — replacing the always-on front-desk role at a fraction of human cost.
From a phone call to a booked appointment.
A caller dials the business. Twilio streams the audio to a real-time speech-to-text service, which feeds an LLM agent grounded in the business's RAG knowledge base. The agent answers in real-time TTS, books appointments straight into Google Calendar, and writes leads to the CRM — all inside one continuous call.
Incoming call
Northwind Plumbing · main line · 24/7 AI receptionist
This is an animated mockup of the voice-AI-receptionist capability — not a live product. Business names, phone numbers, and callers are illustrative.
Twilio telephony
The business phone number routes through Twilio. Inbound audio is streamed to the agent in real time, and outbound TTS audio streams back the same way — no IVR menus.
Real-time speech-to-text
A streaming STT service transcribes the caller turn by turn with partial hypotheses, so the agent can start thinking before the sentence is finished.
RAG knowledge base
Business hours, services, prices, and policies are indexed into a vector store. Each agent turn retrieves the relevant chunks so answers stay grounded — not improvised.
LLM function calling
The agent decides when to answer in voice and when to call a tool — book_appointment, upsert_lead, transfer_to_human — keeping the conversation and the action in one flow.
Calendar + CRM integration
Successful bookings create a real Google Calendar event. New callers land in the CRM with the conversation context attached — the receptionist isn't just answering, it's converting.
Real-time text-to-speech
The agent reply streams to a TTS vendor that returns audio chunks the caller hears almost immediately. Round-trip latency stays inside the conversational-tolerable range.
AI receptionist that answers business phone calls 24/7, routes by intent, books appointments, and captures leads — replacing the always-on front-desk role at a fraction of human cost.
Twilio receives the call; speech-to-text streams the caller's words to an LLM agent grounded in the business's RAG knowledge base; the agent responds via real-time TTS. Workflow automation handles bookings into Google Calendar, lead writes into the CRM, and email follow-up.
A caller dials the business number. Twilio routes the audio to a real-time speech-to-text service that streams transcripts to an LLM agent. The agent has read-only access to a RAG knowledge base built from the business's hours, services, prices, and policies — so answers stay grounded in the actual business rather than improvised. When the caller asks for a booking, an agent function writes a Google Calendar event. New leads land in the business's CRM. End-to-end latency sits in the conversational-tolerable range.
How a request flows through it
Each request enters at the top of the diagram, flows through every box, and lands at the bottom — exactly the way the production system behaves. The scan-line traces where a live request would be right now.
What it's built with
The interesting parts
RAG-grounded answers
Business hours, services, prices, and policies indexed into a vector store and retrieved per turn — answers stay accurate to the actual business instead of being improvised by the LLM.
Real-time voice loop
Speech-to-text streams to the LLM; the LLM's reply streams to text-to-speech. Round-trip latency lands inside the conversational-tolerable range, so the call doesn't feel like talking to a slow bot.
Bookings + CRM in one flow
Successful booking intents write a real calendar event; new leads land in the business's CRM with the conversation context attached. The receptionist isn't just answering — it's converting.
Best-of-breed vendor stack
Twilio for telephony, dedicated speech-to-text and TTS vendors, workflow orchestration for the after-call steps. Each leg uses the best specialist rather than a single all-in-one platform.
The calls that did most of the work
A handful of engineering choices shape how a system feels. Here are the ones we'd still defend — alongside what each one cost.
RAG over fine-tuning for the business knowledge
A business's hours, services, and prices change weekly; fine-tuning would mean re-training on every update. RAG lets the model stay general and the knowledge stay current.
Tradeoff: Slightly higher per-call cost and an extra retrieval hop compared to a fine-tuned baseline.
n8n for orchestration over custom code
The 'after-the-call' workflow — calendar event, CRM record, fallback to human — changes frequently; a visual orchestrator absorbs those changes without code edits.
Tradeoff: Adds a runtime dependency, and debugging crosses two systems when something fails.
Twilio + Deepgram + ElevenLabs (best-of-breed)
Each leg of the voice loop has a strong specialist; assembling the best-of-breed stack ships faster than a single all-in-one vendor.
Tradeoff: Three vendor contracts and three failure modes — vendor lock-in is replaced by vendor sprawl.
Tell us what you're building.
Free 30-minute call. Real humans, real timelines, no follow-up emails forever.