VibePod: Mistral Vibe on a HomePod

A note on authorship and attribution: The whole thing was a back-and-forth with Claude (mostly Opus 4.8): iterative and occasionally messy. I'd describe what I wanted or paste a log, Claude would write the code or dig through source. I'd make an "optimization" which made things worse, Claude would fix it. Sometimes I had to paste an extra log to convince Claude that "No, I did not forget to reload the service." A true team effort right there 😂

I have a few HomePods around the house and a full, native Apple Home setup — no Home Assistant, no custom hub, just Apple's stuff. I also have a Mistral Vibe Pro yearly subscription I got a while back and mostly forgot about. At this point, the whole "Siri dumb" has been memmed enough, so I dont think I need to explain that part any further. I wanted to somehow integrate the otherwise unused Mistral subcription with the HomePod - mostly to answer general knowledge questions, instead of the infamous "Here's some results I found online, I can show them if you ask from your phone".

Then I read Alexis Gallagher's ClawPod post — a tiny bridge that lets you talk to an OpenClaw agent through a HomePod. ClawPod is the direct inspiration and structural basis for this; mine just swaps the "brain". Where his routes to OpenClaw, mine routes to the vibe CLI. I'm calling it VibePod, because of course I am.

01What you get

You say "Hey Siri, ask Vibe…" (the shortcut should ideally be named something that Siri can easily understand), you ask your question out loud, and a few seconds later the HomePod speaks the answer back in Siri's voice. "What are the top ten cities in Greece, and are any known for wine?" "Quick recipe for a Bermuda rum swizzle?" It answers, conversationally, no screen involved.

Does it work? Yes. Does it Just Work™? Kinda — and the reasons are the same ones Alexis hit, because they're baked into how Apple's stack works, not into the bridge:

Siri's voice. The reply comes out in Siri's TTS. You can't swap in a fancier voice and route it to the HomePod.
Indirect invocation. You're really asking Siri to run a Shortcut, not talking to the agent directly. Naming matters — pick something phonetically distinct from your contacts and other shortcuts.
Latency. You're paying for text generation plus speech conversion, so it's a beat slower than a native voice assistant.
Flakiness. Getting the HomePod to reliably trigger a Shortcut by voice can feel a tad wobbly. Validate that before building anything else.

For one-shot questions — trivia, general knowledge, a quick recipe — it's pretty nice.

02How it works

The shape is lifted straight from ClawPod: the HomePod can't run your code, so it proxies through a Shortcut on your iPhone (via Apple's Personal Content feature), which POSTs to a little server you run at home, which shells out to the LLM and returns text to be spoken.

You
  │ (voice)
  ▼
HomePod / Siri
  │ (invokes, via Personal Content)
  ▼
iOS Shortcut  (runs on your iPhone, not the HomePod)
  │ (HTTP POST /chat)
  ▼
VibePod server  (Python / FastAPI, on my Arch box btw)
  │ (shells out)
  ▼
vibe -p  (Mistral Vibe CLI, programmatic mode)
  │
  ▼
reply text ──► back up the chain ──► HomePod speaks

The one meaningful difference from ClawPod is the last hop. Where ClawPod calls openclaw agent, VibePod calls vibe -p — Mistral Vibe's programmatic mode. I asked Claude to read through Vibe's source to figure out the cleanest way to drive it headlessly, and the nice surprise was --output text: instead of a JSON envelope you have to dig through, stdout is the answer. The server's parsing logic is one line.

03The build (and a dead end)

I didn't start with Vibe. My first instinct was Claude Code. Claude (the Chat version) and I got a working version going quickly, but trying to make it lean ran into a wall: Claude Code's --bare flag, which strips its big tool-loading context, currently doesn't suppor subscription/OAuth auth — it reports "Not logged in" even when you are. In other words, it requires using a pay-per-token API, which was a hard "No" for me. Without the --bare flag, every single query, even something like "Say Hi" consumed ~17,000 tokens - this is because Claude Code's built-in tool definitions loaded evey time it was invoked, even when -p flaf was used. So --bare was off the table, and without it the context tax per query was large.

That's when we pivoted to Vibe. That worked suprisingly well: programmatic mode is a proper first-class feature (not a shell hack), --output text makes parsing trivial, and auth is dead simple — Vibe reads credentials from ~/.vibe/, so once you've logged in there's no token juggling. I asked Claude to write the whole server; it's a FastAPI app of a couple hundred lines, and almost all of it is plumbing the Shortcut already expects.

One small design choice pivot compared to Alexis' ClawPod: ending a conversation costs zero tokens. Instead of asking the model to detect "goodbye" in its reply (and burning a round-trip to do it), the server checks the incoming text against a list of end-phrases first, and if it matches, it returns "Goodbye!" instantly without ever calling the LLM.

04The token rabbit hole

This is the "OK it works, but now I need to optimize it" part. Once everything worked, I got curious about how much each question actually cost, and Claude and I went down a slighly comical hole chasing the number down. There's an old Chernomyrdinka — "we wanted it to be better, but it turned out as always" — definitely applicable here.

Vibe is a coding agent, so by default every query hauls a big system prompt full of coding instructions plus the schemas for ten tools (bash, edit, grep…) — none of which a HomePod needs to tell you about Greek wine. The journey looked like this:

Default Vibe (coding prompt, medium model, "high" thinking)

2,321

After "optimizing" — I dropped a flag and accidentally loaded all ten tools 🙃

5,882

Custom voice system prompt + stripped the extras

1,721

Switched to the small model, one tool, no thinking

343

That middle row is the "turned out as always" moment: in trying to slim things down I removed a Web Search restriction and tripled the token count. Thanksully Claude caught it from the session logs, we put the restriction back. An actually meaningful optimization was using a tiny custom system prompt (Vibe lets you drop a Markdown file in ~/.vibe/prompts/ and point at it — no source edits, survives updates), turning off the coding-agent context sections, and switching from the medium model with reasoning to the small model with thinking off.

343

tokens / query

~$0.00004

cost / query

0.8s

response time

122×

cheaper than start

At roughly four-thousandths of a cent per question, I'd have to ask a couple thousand questions to spend a dollar. Remember this is not billed via API, but is using my otherwise mostly idle Vide Pro subscription. It's fast enough to feel snappy through the speaker.

05Setting it up

Three files live in the vibepod-src repo: vibepod_server.py (the FastAPI proxy), vibepod.service (a systemd unit so it survives reboots), and voice.md (the lean system prompt). The iOS Shortcut itself is unchanged from ClawPod — just point its URL at your server.

1 · Get Vibe running

Install and log in by following Mistral's install & setup guide. (I used uv tool install mistral-vibe, then logged in once so credentials land in ~/.vibe/.) The server runs as the same user, so it inherits that login — no API keys to wire up.

2 · Run the server

# on the box that runs `vibe`, reachable on your LAN
uv run vibepod_server.py
# sanity check
curl http://<your-server-ip>:7001/health

Under the hood, each request becomes roughly:

vibe -p "<your question>" \
  --output text # stdout = the answer \
  --trust --workdir <per-speaker dir> \
  --continue # resumes that speaker's thread \
  --max-turns 3

3 · Make it lean

Drop voice.md into ~/.vibe/prompts/, then set these in the systemd unit. This is the difference between 2,321 tokens and 343:

Environment=VIBE_SYSTEM_PROMPT_ID=voice
Environment=VIBE_ACTIVE_MODEL=devstral-small
Environment=VIBE_INCLUDE_PROMPT_DETAIL=false
Environment=VIBE_INCLUDE_PROJECT_CONTEXT=false
Environment=VIBE_INCLUDE_COMMIT_SIGNATURE=false
Environment=VIBEPOD_ENABLED_TOOLS=web_search
Environment=VIBEPOD_MAX_TURNS=3

One nice touch in voice.md: it tells the model to answer from its own training data by default, and only reach for web search if you explicitly say something like "look that up." So casual questions are instant single-turn answers, and you opt into a slower web lookup only when you want one. Full file and a per-line walkthrough are in the repo's README.

06Was it worth it?

For the cost of an afternoon: yup. It turns a device I already own into one more way to reach a capable assistant. Useful when sitting on a couch watching a show and you have a question about an actor or a bit of trivia, etc.

Alexis put it well in his post, and I agree: this clearly feels like the future, it just hasn't fully shipped yet. You can see the shape of it. Strong world knowledge, cheap inference, voice in and voice out, on hardware that's already on your shelf. Today it's a Rube Goldberg machine of shortcuts and proxies. Eventually it'll just be how the speaker works.

Until then — VibePod. Built in an afternoon, costs nothing to run, and my barely-used Vibe subscription finally earns its keep.