An AI receptionist that answers your phone — and can't go off-script

Trigger: Twilio voice webhook · Pattern: speech → classify → route → TwiML · Sample: examples/use-cases/voice-receptionist.toml · Status: runs today, end to end — natively as of v1.2.0

The problem

Every business with a phone number has the same first sixty seconds: "Are you a customer? Do you want sales? Is this urgent?" A human receptionist does this beautifully and expensively. A phone tree does it cheaply and infuriatingly ("press 4 to hear these options again"). An LLM can do it conversationally — but handing your phone line to a free-running AI is how you end up in a screenshot: a caller talks the bot into promising a refund, quoting a fake discount, or transferring them to the CEO.

The interesting problem isn't making an AI answer the phone. It's making an AI answer the phone while provably unable to do anything you didn't script.

How a phone call becomes a workflow

Twilio's voice platform drives a call as a series of web requests: the caller speaks, Twilio transcribes the speech and POSTs it to your webhook, and your reply — a small XML document called TwiML — tells Twilio what to do next: say something, gather more speech, transfer to a human. A phone call is secretly a request/response loop, which is exactly the shape a bounded workflow engine eats for breakfast.

One workflow file is the entire receptionist:

  1. Twilio POSTs the caller's transcribed speech (form-encoded — parsed natively into the trigger payload) to the authenticated route. Authentication is the basic-auth credentials Twilio carries in the webhook URL (auth = "basic:twilio").
  2. One bounded llm_infer step classifies intent — sales, support, or "get me a human" — and drafts the next line to speak. The output is schema-enforced: the model must return {intent, reply} where intent is one of exactly three values. Not four. Not "well, actually". Three.
  3. A switch routes on the intent. The model doesn't choose the route — it produced a value, and a declared edge matches it.
  4. A respond node renders the TwiML for the chosen lane — speak the reply and gather the next turn, or speak and <Dial> the human desk. The transfer number is declared in the workflow, where no caller can reach it.
  5. Every turn is appended to a per-call audit log: what was heard, what was decided, when.
[[nodes]]
id = "transfer_to_human"
type = "respond"
content_type = "text/xml"
body_template = """
<Response><Say>{{reply}}</Say><Dial>+15550100</Dial></Response>
"""
input_from = "classify.parsed"

No bridge service, no middleware — the runtime answers Twilio in Twilio's language. (Until v1.2.0 this took a small TwiML-rendering proxy; the gap analysis called it, and the respond node + form-encoded parsing + basic auth closed it.)

Why you can trust it on your phone line

The security property is structural, and it's worth spelling out because it's the difference between this and a chatbot with a phone number:

Honest limits