Agent Phone: Teaching Claude Code to Call Me When It Gets Stuck

Here’s the problem with autonomous AI agents: they’re great until they need to ask you something.

I leave Claude Code running on long tasks all the time. Refactoring a module, building out infrastructure, working through a test suite. But I can’t always sit and watch. I step away, grab coffee, take the dog out, get pulled into a meeting. And inevitably, Claude hits a decision point while I’m gone. Architecture choice, ambiguous requirement, something that needs human judgment. It just sits there in the terminal waiting for input until I come back.

I wanted the agent to call me on my phone instead.

What I built

Agent Phone is a Claude Code plugin. When Claude finishes a turn and has a question, a stop hook fires, detects the question, and hits a local orchestration server. That server calls my phone via Twilio. I answer, have a voice conversation where Claude explains what it’s stuck on, I give my answer, and the agent resumes with my input. The whole round-trip takes 30 seconds instead of however long it takes me to notice the terminal.

It also supports SMS for when a phone call feels like too much. And you can switch modes mid-call. Say “text me next time” and it remembers.

The flow

The technical chain is:

Claude Code finishes a turn and blocks (needs input)
A stop hook script fires, detects the question in Claude’s last message
Hook POSTs to the local Fastify server with the question and context
Server places an outbound call via Twilio (or sends an SMS)
Twilio connects a ConversationRelay WebSocket for real-time voice
A separate Claude API call handles the phone conversation. It has the context from the stuck agent but runs independently
Developer answers, Claude confirms understanding
Server returns the answer to the blocking hook
Original Claude Code session resumes

Running the phone conversation through a separate Claude session turned out to be important. It keeps the voice call from touching the main agent’s context or state. The phone Claude just needs to extract a decision and pass it back through the hook.

Twilio ConversationRelay

This was the interesting technical piece. ConversationRelay is Twilio’s WebSocket protocol for real-time voice conversations with AI. Twilio handles speech-to-text and text-to-speech on their end; I handle the Claude API calls in between.

The server translates between Twilio’s ConversationRelay protocol and Claude’s messages API. When the developer speaks, Twilio sends a prompt message with the transcribed text. I feed that to Claude (with the injected context about what the agent is stuck on), get back a response, and send it as a text message for Twilio to speak aloud.

The system prompt for the voice Claude is specific: you’re a one-way messenger, get their answer, confirm it, end the call. It also watches for preference changes. If the developer says “just text me instead,” it updates the config file during the call.

The mock mode that made this buildable

You can’t iterate on voice call flows by placing real phone calls every time. I built a mock Twilio server and a browser-based phone UI that simulates the entire flow locally. The mock phone uses Web Speech API for voice input, so you can actually talk to it in your browser.

Mock mode runs with docker compose --profile dev up. No Twilio credentials, no tunnel, no phone number needed. The real flow uses cloudflared to tunnel the local server to a public HTTPS URL for Twilio’s webhooks.

Setup and configuration

ConversationRelay requires accepting the “Predictive and Generative AI/ML Features Addendum” in the Twilio console. There’s no API for this, so it’s a manual step. If you miss it, calls fail with a generic error message that doesn’t point you toward the cause. I spent a while on that one before finding the answer in the Call Events API.

SMS needs toll-free verification, which takes a few business days. Voice works right away, but SMS requires the regulatory review to complete first. The setup wizard auto-submits the verification request, but there’s a waiting period.

I built an interactive setup wizard (npm run setup) that automates everything else: validates Twilio credentials, searches for available toll-free numbers, purchases one, submits SMS verification, detects and installs cloudflared, and places a test call.

The stop hook

Claude Code’s hook system is what makes this possible. A stop hook fires every time Claude finishes a turn and would normally wait for user input. The hook script checks if Claude’s last message contains a question, checks if mobile mode is enabled, and if both are true, POSTs to the server and blocks waiting for the response.

The hook doesn’t return until the phone call completes and the developer provides an answer. From Claude Code’s perspective, it’s like the user typed a response. It just took a phone call to get there.

The question detection uses some heuristics: looking for question marks, phrases like “should I” or “would you prefer,” and checking that the message isn’t just a status update with a rhetorical question.

What shipped

Three commits, about 1,600 lines of code:

Fastify orchestration server with HTTP and WebSocket endpoints
Twilio integration for outbound calls and SMS
ConversationRelay WebSocket handler for real-time voice
Dual Claude provider support (Anthropic API and AWS Bedrock)
Five slash commands for mode control (/going-mobile, /call-me, /text-me, /stop, /call-status)
Stop hook with question detection
Mock Twilio server and browser phone UI for local development
Interactive setup wizard for Twilio provisioning
Docker Compose with dev and prod profiles
51 tests

What’s next

The part I’m most interested in going forward is the general pattern: an agent that can reach out to you through whatever channel makes sense, get a decision, and keep going. Right now it’s phone calls and SMS. The architecture (stop hook detects a block, server handles outreach, response flows back) would work for Slack, Teams, push notifications, whatever.

I’ve been using it for a couple of weeks and the thing I notice most is that I’m more willing to let Claude run on bigger tasks. Before, I’d scope things tightly because I knew I’d need to babysit. Now I can kick off something ambitious and go do other things, knowing it’ll call if it gets stuck.

Built with Claude Code in one sitting. 3 commits, ~1,600 lines.