pi-simple-voice: Give your agent a voice

15th June 2026

tl;dr: pi-simple-voice makes the pi coding agent naturally speak its responses. It streams to speech as the response is generated, completely locally, no frills. pi install npm:pi-simple-voice.

I love evaluating local models, but they’re just not there for real coding work on my lil Macbook. So I have a little pi setup for “secretary”, with skills for reading emails, maintaining calendar, reminders, and so on. It’s neat, and can shell out to actual claude or codex in case “real work” comes up.

For this use case, it’s cute to just speech-to-text blab to the model in pi, and (after the local model does its little work and finally comes back with a response) we hear the next turn, natural speech, from the background tab.

How it works

pi install npm:pi-simple-voice
First time it speaks, it downloads a surprisingly good ~80MB voice model in the background.
That’s it! Run /voice any time to pick the voice, speed, or model.

The whole thing runs locally on Kokoro-82M, a genuinely tiny text-to-speech model that sounds far better than it has any right to.

A few things I did differently

This started as a fork of s1m0n38/pi-voice. It made the great Kokoro-82M model choice, but required stopping and starting a separate http api cli, and had a built-in summarisation step which added a huge delay especially for local generations. So I focussed on the experience and simplicity.

It manages the text-to-voice in pi. The extension itself spins up the API detached in the background, so it doesn’t slow down pi and doesn’t require you to manage it yourself. After 15 minutes (configurable) the TTS API exits itself and releases that memory. If it’s not running on the next completion it just gets auto-started in under a second.

It speaks verbatim. Especially for local models, a whole new uncached completion just to summarise the output before TTS is a huge pain. If you’re using this, just prompt your agent to reason harder and have a more concise final response, suitable for voice output. Also, in my testing pi-voice’s summaries garbled the summarisation with stuff like “The human wants to … and the agent …” which is counterproductive. Just speak the response thanks.

Simplification to the human- and agent-facing tools. pi-voice has /tts and a tts() tool for the agent itself. Why would I want to /tts stuff myself? Often we’re trying to reduce the number of tools exposed to the agent, not increase them. Bonus /voice menu simplifications: less eager downloading of models. Bonus stuff i forgot: deleted a bunch of features you won’t miss.

Other stuff:

Kokoro is small, fast, and good. We pick a small enough default quantisation, but if you want great, try the bigger quants.
It interrupts itself when you start your next turn.
To support streaming responses while sounding natural, we attempt to split it at natural places like commas and sentence-ends. It’s “pretty good” but sometimes gets a little wacky on code.
Toggle voice output with option+v, there’s a cute little ♩ icon highlighted when voice output is enabled.

Give it a spin

pi install npm:pi-simple-voice

Fully open source, MIT: grrowl/pi-simple-voice.

Feel free to use it, break it, extend it, chuck us a pull request if you make anything nice! Let me know if you like it. Cheers.