Skip to main content

How to use the plugin

The Runtime Text To Speech plugin synthesizes text into speech using downloadable voice models. These models are managed in the plugin settings within the editor, downloaded, and packaged for runtime use. Follow the steps below to get started.

Editor side

Download the appropriate voice models for your project as described here. You can download multiple voice models at the same time.

Runtime side

Create the synthesizer using the CreateRuntimeTextToSpeech function. Ensure you maintain a reference to it (e.g. as a separate variable in Blueprints or UPROPERTY in C++) to prevent it from being garbage collected.

An example of creating a Runtime Text To Speech synthesizer in Blueprints

Synthesizing Speech

The plugin offers two modes of text-to-speech synthesis:

  1. Regular Text-to-Speech: Synthesizes the entire text and returns the complete audio when finished
  2. Streaming Text-to-Speech: Provides audio chunks as they're generated, allowing for real-time processing

Each mode supports two methods for selecting voice models:

  • By Name: Select a voice model by its name (recommended for UE 5.4+)
  • By Object: Select a voice model by direct reference (recommended for UE 5.3 and earlier)

Regular Text-to-Speech

By Name

The Text To Speech (By Name) function is more convenient in Blueprints starting from UE 5.4. It allows you to select voice models from a dropdown list of the downloaded models. In UE versions below 5.3, this dropdown doesn't appear, so if you're using an older version, you'll need to manually iterate over the array of voice models returned by GetDownloadedVoiceModels to select the one you need.

An example of using Text To Speech by Name in Blueprints

By Object

The Text To Speech (By Object) function works across all versions of Unreal Engine but presents the voice models as a dropdown list of asset references, which is less intuitive. This method is suitable for UE 5.3 and earlier, or if your project requires a direct reference to a voice model asset for any reason.

An example of using Text To Speech by Object in Blueprints

If you've downloaded the models but can't see them, open the Voice Model dropdown, click the settings (gear icon), and enable both Show Plugin Content and Show Engine Content to make the models visible.

Streaming Text-to-Speech

For longer texts or when you want to process audio data in real-time as it's being generated, you can use the streaming versions of the Text-to-Speech functions:

  • Streaming Text To Speech (By Name) (StreamingTextToSpeechByName in C++)
  • Streaming Text To Speech (By Object) (StreamingTextToSpeechByObject in C++)

These functions provide audio data in chunks as they're generated, allowing for immediate processing without waiting for the entire synthesis to complete. This is useful for various applications like real-time audio playback, live visualization, or any scenario where you need to process speech data incrementally.

Streaming By Name

The Streaming Text To Speech (By Name) function works similarly to the regular version but provides audio in chunks through the On Speech Chunk delegate.

An example of using Streaming Text To Speech by Name in Blueprints

Streaming By Object

The Streaming Text To Speech (By Object) function provides the same streaming functionality but takes a voice model object reference.

An example of using Streaming Text To Speech by Object in Blueprints

Audio Playback

For regular (non-streaming) text-to-speech, the On Speech Result delegate provides the synthesized audio as PCM data in float format (as a byte array in Blueprints or TArray<uint8> in C++), along with the Sample Rate and Num Of Channels.

For playback, it's recommended to use the Runtime Audio Importer plugin to convert raw audio data into a playable sound wave.

Here's an example of how the Blueprint nodes for synthesizing text and playing the audio might look (Copyable nodes):

Cancelling Text-to-Speech

You can cancel an ongoing text-to-speech synthesis operation at any time by calling the CancelSpeechSynthesis function on your synthesizer instance:

Cancelling Text To Speech in Blueprints

When a synthesis is cancelled:

  • The synthesis process will stop as soon as possible
  • Any ongoing callbacks will be terminated
  • The completion delegate will be called with bSuccess = false and an error message indicating the synthesis was cancelled
  • Any resources allocated for the synthesis will be properly cleaned up

This is particularly useful for long texts or when you need to interrupt playback to start a new synthesis.

Speaker Selection

Both Text To Speech functions accept an optional speaker ID parameter, which is useful when working with voice models that support multiple speakers. You can use the GetSpeakerCountFromVoiceModel or GetSpeakerCountFromModelName functions to check if multiple speakers are supported by your chosen voice model. If multiple speakers are available, simply specify your desired speaker ID when calling the Text To Speech functions. Some voice models offer extensive variety - for example, English LibriTTS includes over 900 different speakers to choose from.

The Runtime Audio Importer plugin also provides additional features like exporting audio data to a file, passing it to SoundCue, MetaSound, and more. For further details, check out the Runtime Audio Importer documentation.