Overview
Runtime Text To Speech is a plugin that enables real-time, offline, and cross-platform text-to-speech synthesis. It supports 39 languages, over 900 voices, and 160+ voice qualities – now featuring Kokoro 🚀, a cutting-edge open-source voice model family with studio-quality output. The plugin is fast, lightweight, and ideal for games, apps, and projects requiring natural-sounding speech.
Currently, the plugin supports the following platforms: Windows, Linux, Mac, Android (including Meta Quest), and iOS.
📹 See It in Action
Watch the YouTube Demo or test generic voice samples at Piper Samples.
Kokoro
The plugin now implements Kokoro voice models - high-quality open-source TTS architectures recently published on Hugging Face.
- 45 high-quality models across 6 languages:
🇺🇸 English (US) • 🇬🇧 English (UK) • 🇪🇸 Spanish • 🇧🇷 Portuguese • 🇮🇳 Hindi • 🇫🇷 French - Live preview available: Test Kokoro Voices
The Kokoro voice models are currently among the highest-quality open-source TTS solutions available today.
Installation
To get started, install voice models via the plugin settings on the first run. After installation, you can begin using the plugin in your project. For detailed instructions, refer to the How to use the plugin page.
Plugin Details
This plugin provides real-time text-to-speech synthesis using Piper, Kokoro, and ONNX Runtime libraries. The plugin allows you to download and manage multiple voice models via the editor, which can then be packaged with your project.
The core functionality consists of text input processing and voice model selection for synthesis. Some voice models support multiple speakers - for instance, English LibriTTS includes over 900 different speakers, German Thorsten Emotional has 7 speakers, etc. The output is PCM audio data (in float format) with corresponding sample rate and number of channels. Converting this raw audio data into a playable sound wave requires the Runtime Audio Importer plugin.