Skip to main content

Overview

Runtime Text To Speech Documentation

Runtime Text To Speech is a plugin that enables real-time, offline, and cross-platform text-to-speech synthesis. It supports over 35 languages and 100 voices, with more than 120 voice qualities. The plugin is fast and can theoretically leverage GPU acceleration for synthesis, though this feature is not yet implemented.

Currently, the plugin supports the following platforms: Windows, Linux, Mac, Android, and iOS. Support for Meta Quest and Apple Vision Pro is possible but untested, and Mac/iOS support is experimental.

If you want to preview how the synthesized voices sound before acquiring the plugin, you can test the voice examples online using this link.

Installation

To get started, install voice models via the plugin settings on the first run. After installation, you can begin using the plugin in your project. For detailed instructions, refer to the How to use the plugin page.

Plugin Details

This plugin provides real-time text-to-speech synthesis using Piper (which relies on utf8 and uni-algo libraries) and ONNX Runtime libraries. The plugin allows you to download and manage multiple voice models via the editor, which can then be packaged with your project.

The plugin's core functionality allows you to input text and select a voice model. It outputs synthesized speech as PCM audio data (in float format), along with the sample rate and the number of channels of the generated speech. To convert this raw audio data into a playable sound wave, you may need the Runtime Audio Importer plugin.