Read aloud

A free, private text-to-speech tool: it turns any text into natural-sounding speech entirely on your device — nothing you paste is ever uploaded.

Paste anything. Pick a narrator. Hear it spoken back — fully in your browser.

Loading…

FAQ

How this works under the hood, what model it uses, and what stays on your device.

  • Is it free?

    Yes — completely free, with no sign-up, no account, and no usage limits. The voice model runs on your own device, so there is no per-character API cost to pass on to you.

  • Is my text private?

    Yes. Everything runs in your browser — the text you paste is never sent to a server, and no audio is uploaded. Once the model has loaded you can even disconnect from the network and it keeps working.

  • Do I need to install anything or sign up?

    No. It is a web page — no extension, no app, and no account. Open it, paste text, and press Play. The only download is the voice model itself, which caches in your browser on first use.

  • Can I download the audio?

    Yes. After a clip plays you can save it as a WAV file, or generate and download the audio without playing it first. The file is created in your browser from the same on-device synthesis.

  • Where does the audio come from?

    Your browser. Kokoro-82M runs as ONNX via Transformers.js in a Web Worker. The text you paste is tokenized, fed through the model, and the resulting waveform is played back through your own AudioContext. Nothing is sent to any server.

  • What model is used?

    onnx-community/Kokoro-82M-v1.0-ONNX, 8-bit quantized (~92 MB). It ships 54 voices across American and British accents; the booth surfaces a curated 8 to keep the cartridge rack legible.

  • What happens on first use?

    Roughly 92 MB of model weights download and cache in your browser the first time you press Play. They go into the browser's built-in IndexedDB cache (managed by Transformers.js); subsequent runs read from cache and start within a second or two.

  • Does it work offline?

    After the first model download, yes. The whole pipeline — tokenizer, model, audio playback — runs locally, so you can pull the network cable and the booth still works.

  • Does it use WebGPU?

    The default backend is WebAssembly with the 92 MB q8 model, so there is no WebGPU gate at the door. A “studio quality” toggle can switch to fp16 weights on WebGPU when available.

  • How is the player wired together?

    A small client library (kokoro-js) wraps the model. We drive its streaming API one sentence at a time and queue each chunk onto a single AudioContext so playback is gapless even when synthesis is faster than realtime. The VU meter is bound to an AnalyserNode on that same context — real RMS, not a timer.