Skip to main content

Overview

Runtime Text To Speech Documentation

Runtime Text To Speech is a plugin that enables real-time, offline, and cross-platform text-to-speech synthesis. It supports over 35 languages and 900 voices, with more than 120 voice qualities. The plugin is fast, lightweight, and easy to use, making it ideal for games, applications, and other projects that require text-to-speech functionality.

Currently, the plugin supports the following platforms: Windows, Linux, Mac, Android (including Meta Quest), and iOS. Support for Apple Vision Pro is possible but untested,.

📹 Watch the plugin in action! Check out the YouTube video demonstration to see how it works.

If you want to preview how the synthesized voices sound before acquiring the plugin, you can test the voice examples online using this link.

Installation

To get started, install voice models via the plugin settings on the first run. After installation, you can begin using the plugin in your project. For detailed instructions, refer to the How to use the plugin page.

Plugin Details

This plugin provides real-time text-to-speech synthesis using Piper (which relies on utf8 and uni-algo libraries) and ONNX Runtime libraries. The plugin allows you to download and manage multiple voice models via the editor, which can then be packaged with your project.

The core functionality consists of text input processing and voice model selection for synthesis. Some voice models support multiple speakers - for instance, English LibriTTS includes over 900 different speakers, German Thorsten Emotional has 7 speakers, etc. The output is PCM audio data (in float format) with corresponding sample rate and number of channels. Converting this raw audio data into a playable sound wave requires the Runtime Audio Importer plugin.