Guide · updated July 2026 · licenses checked
6 Best Self-Hosted ElevenLabs Alternatives
ElevenLabs is excellent and priced like it knows it. Here are the open-source text-to-speech and voice-cloning models you can run on your own hardware instead — including the license traps that make some of them illegal to use commercially.
TL;DR — our picks
Best all-rounder: Kokoro. Small, fast, permissive license, shockingly good quality for its size — runs on a CPU.
Voice cloning: Chatterbox. MIT-licensed cloning from a short audio sample, with emotion control.
Smart-home / low power: Piper. Real-time on a Raspberry Pi.
| Model | Voice cloning | Commercial use | Hardware | Best for |
|---|---|---|---|---|
| Kokoro | No | ✔ Permissive | CPU is fine | Narration, apps, general TTS |
| Chatterbox | Yes | ✔ MIT | GPU recommended | Voice cloning with emotion control |
| OpenVoice v2 | Yes | ✔ MIT | GPU recommended | Cloning + tone/style control |
| Piper | No | ✔ MIT | CPU / Raspberry Pi | Home Assistant, embedded, speed |
| XTTS-v2 | Yes | ✘ Non-commercial | GPU, ~4GB+ VRAM | Multilingual cloning (personal use) |
| F5-TTS | Yes | ✘ Weights non-commercial | GPU recommended | Research-grade cloning quality |
The six, in detail
Kokoro became the community favorite for a reason: it's a tiny model (82M parameters — hundreds of times smaller than an LLM) that produces clean, natural narration, runs in real time on ordinary CPUs, and ships under a permissive license. If you want "read this text aloud, nicely, on my own machine," start here and you may never need anything else.
Trade-off: you pick from its built-in voices — it doesn't clone yours.
Kokoro on GitHub →Released by Resemble AI under MIT, Chatterbox clones a voice from a short reference clip and — its party trick — lets you dial emotion and intensity up or down. It's the closest open-source answer to what people actually buy ElevenLabs for.
Trade-off: it wants a GPU for comfortable generation speeds, and like all cloning models, output quality tracks the quality of your reference audio.
Chatterbox on GitHub →OpenVoice (from MyShell) separates whose voice from how it's spoken — clone a speaker, then independently adjust style. Version 2 moved to an MIT license, which made it safe for commercial projects.
Trade-off: setup is more researcher-flavored than Chatterbox; expect to read the README properly.
OpenVoice on GitHub →Piper is the workhorse of the Home Assistant world: MIT-licensed, absurdly efficient, real-time speech on hardware as small as a Raspberry Pi, with a big catalog of ready-made voices across many languages. For smart-home announcements, accessibility, or anything embedded, it's the default.
Trade-off: voices are pleasant but noticeably more "TTS" than Kokoro or the cloning models.
Piper on GitHub →Coqui's XTTS-v2 does multilingual voice cloning from a few seconds of audio and still holds up. But two caveats: the company behind it shut down (the community keeps forks alive), and the model license is non-commercial. Fine for personal projects; not for anything that earns.
Coqui TTS on GitHub →F5-TTS produces some of the most natural cloned speech in open source. The catch mirrors XTTS: the released model weights carry non-commercial terms (they were trained on a non-commercial dataset). Beautiful for experiments and personal use; check licensing carefully before anything commercial.
F5-TTS on GitHub →Hardware reality check
Good news: TTS is far lighter than chat models.
- CPU only — Kokoro and Piper run in real time. No GPU required at all.
- 4–8GB VRAM — comfortable territory for XTTS-v2 and most cloning workloads.
- Batch jobs (audiobook-length narration, datasets) — renting a GPU by the hour beats waiting overnight. Try RunPod → · Try Vast.ai →
FAQ
Is there a truly free ElevenLabs alternative?
Yes — everything above is free to run. Your costs are hardware and time. Kokoro on a CPU is the lowest-friction starting point.
Can I legally clone my own voice?
Cloning your own voice is fine. Cloning someone else's without consent ranges from ethically bad to legally actionable, depending on where you live and what you do with it. Don't.
Which one should a beginner start with?
Kokoro if you just need speech; Chatterbox if you specifically want cloning. Both have active communities when you get stuck.