How to use the plugin
The Runtime Text To Speech plugin synthesizes text into speech using downloadable voice models. These models are managed in the plugin settings within the editor, downloaded, and packaged for runtime use. Follow the steps below to get started.
Editor side
Download the appropriate voice models for your project as described here. You can download multiple voice models at the same time.
Runtime side
Create the synthesizer using the CreateRuntimeTextToSpeech
function. Ensure you maintain a reference to it (e.g. as a separate variable in Blueprints or UPROPERTY in C++) to prevent it from being garbage collected.
- Blueprint
- C++
// Create the Runtime Text To Speech synthesizer in C++
URuntimeTextToSpeech* Synthesizer = URuntimeTextToSpeech::CreateRuntimeTextToSpeech();
// Ensure the synthesizer is referenced correctly to prevent garbage collection (e.g. as a UPROPERTY)
Once the synthesizer is created, you can call either of the following functions to synthesize text:
Text To Speech (By Name)
(TextToSpeechByName
in C++)Text To Speech (By Object)
(TextToSpeechByObject
in C++)
By Name
- Blueprint
- C++
The Text To Speech (By Name)
function is more convenient in Blueprints starting from UE 5.4. It allows you to select voice models from a dropdown list of the downloaded models. In UE versions below 5.3, this dropdown doesn't appear, so if you're using an older version, you'll need to manually iterate over the array of voice models returned by GetDownloadedVoiceModels
to select the one you need.
In C++, the selection of voice models can be slightly more complex due to the lack of a dropdown list. You can use the GetDownloadedVoiceModelNames
function to retrieve the names of the downloaded voice models and select the one you need. Afterward, you can call the TextToSpeechByName
function to synthesize text using the selected voice model name.
// Assuming "Synthesizer" is a valid and referenced URuntimeTextToSpeech object (ensure it is not eligible for garbage collection during the callback)
TArray<FName> DownloadedVoiceNames = URuntimeTTSLibrary::GetDownloadedVoiceModelNames();
// If there are downloaded voice models, use the first one to synthesize text, just as an example
if (DownloadedVoiceNames.Num() > 0)
{
const FName& VoiceName = DownloadedVoiceNames[0]; // Select the first available voice model
Synthesizer->TextToSpeechByName(VoiceName, TEXT("Text example 123"), FOnTTSResultDelegateFast::CreateLambda([](bool bSuccess, const TArray<uint8>& AudioData, int32 SampleRate, int32 NumChannels)
{
UE_LOG(LogTemp, Log, TEXT("TextToSpeech result: %s, AudioData size: %d, SampleRate: %d, NumChannels: %d"), bSuccess ? TEXT("Success") : TEXT("Failed"), AudioData.Num(), SampleRate, NumChannels);
}));
return;
}
By Object
- Blueprint
- C++
The Text To Speech (By Object)
function works across all versions of Unreal Engine but presents the voice models as a dropdown list of asset references, which is less intuitive. This method is ideal for UE 5.3 and earlier, or if your project requires a direct reference to a voice model asset for any reason.
If you've downloaded the models but can't see them, open the Voice Model dropdown, click the settings (gear icon), and enable both Show Plugin Content and Show Engine Content to make the models visible.
In C++, the selection of voice models can be slightly more complex due to the lack of a dropdown list. You can use the GetDownloadedVoiceModelNames
function to retrieve the names of the downloaded voice models and select the one you need. Then, you can call the GetVoiceModelFromName
function to get the voice model object and pass it to the TextToSpeechByObject
function to synthesize text.
// Assuming "Synthesizer" is a valid and referenced URuntimeTextToSpeech object (ensure it is not eligible for garbage collection during the callback)
TArray<FName> DownloadedVoiceNames = URuntimeTTSLibrary::GetDownloadedVoiceModelNames();
// If there are downloaded voice models, use the first one to synthesize text, for example
if (DownloadedVoiceNames.Num() > 0)
{
const FName& VoiceName = DownloadedVoiceNames[0]; // Select the first available voice model
TSoftObjectPtr<URuntimeTTSModel> VoiceModel;
if (!URuntimeTTSLibrary::GetVoiceModelFromName(VoiceName, VoiceModel))
{
UE_LOG(LogTemp, Error, TEXT("Failed to get voice model from name: %s"), *VoiceName.ToString());
return;
}
Synthesizer->TextToSpeechByObject(VoiceModel, TEXT("Text example 123"), FOnTTSResultDelegateFast::CreateLambda([](bool bSuccess, const TArray<uint8>& AudioData, int32 SampleRate, int32 NumChannels)
{
UE_LOG(LogTemp, Log, TEXT("TextToSpeech result: %s, AudioData size: %d, SampleRate: %d, NumChannels: %d"), bSuccess ? TEXT("Success") : TEXT("Failed"), AudioData.Num(), SampleRate, NumChannels);
}));
return;
}
Playback
The On Speech Result
delegate provides the synthesized audio as PCM data in float format (as a byte array in Blueprints or TArray<uint8>
in C++), along with the Sample Rate
and Num Of Channels
. You can process this data as needed.
For playback, it's recommended to use the Runtime Audio Importer plugin to convert raw audio data into a playable sound wave. However, this is optional, and you can handle raw PCM audio data with your own solution if desired.
To use Runtime Audio Importer, call Import Audio From RAW Buffer
with the RAW Buffer
(from the On Speech Result
delegate), Sample Rate
, and Num Of Channels
to generate a sound wave.
Once you have the sound wave, you can use it like any regular sound wave in Unreal Engine - playing it back, saving it to a save game file, or further processing it.
- Blueprint
- C++
Here's an example of how the Blueprint nodes for synthesizing text and playing the audio might look (Copyable nodes):
Here's an example of how to synthesize text and play the audio in C++:
// Assuming "Synthesizer" is a valid and referenced URuntimeTextToSpeech object (ensure it is not eligible for garbage collection during the callback)
// Ensure "this" is a valid and referenced UObject (must not be eligible for garbage collection during the callback)
TArray<FName> DownloadedVoiceNames = URuntimeTTSLibrary::GetDownloadedVoiceModelNames();
// If there are downloaded voice models, use the first one to synthesize text, for example
if (DownloadedVoiceNames.Num() > 0)
{
const FName& VoiceName = DownloadedVoiceNames[0]; // Select the first available voice model
Synthesizer->TextToSpeechByName(VoiceName, TEXT("Text example 123"), FOnTTSResultDelegateFast::CreateLambda([this](URuntimeTextToSpeech* TextToSpeechInstance, bool bSuccess, const TArray<uint8>& AudioData, int32 SampleRate, int32 NumOfChannels)
{
if (!bSuccess)
{
UE_LOG(LogTemp, Error, TEXT("TextToSpeech failed"));
return;
}
// Create the Runtime Audio Importer to process the audio data
URuntimeAudioImporterLibrary* RuntimeAudioImporter = URuntimeAudioImporterLibrary::CreateRuntimeAudioImporter();
// Prevent the RuntimeAudioImporter from being garbage collected by adding it to the root (you can also use a UPROPERTY, TStrongObjectPtr, etc.)
RuntimeAudioImporter->AddToRoot();
RuntimeAudioImporter->OnResultNative.AddWeakLambda(RuntimeAudioImporter, [this](URuntimeAudioImporterLibrary* Importer, UImportedSoundWave* ImportedSoundWave, ERuntimeImportStatus Status)
{
// Once done, remove it from the root to allow garbage collection
Importer->RemoveFromRoot();
if (Status != ERuntimeImportStatus::SuccessfulImport)
{
UE_LOG(LogTemp, Error, TEXT("Failed to import audio, status: %s"), *UEnum::GetValueAsString(Status));
return;
}
// Play the imported sound wave (ensure a reference is kept to prevent garbage collection)
UGameplayStatics::PlaySound2D(GetWorld(), ImportedSoundWave);
});
RuntimeAudioImporter->ImportAudioFromRAWBuffer(AudioData, ERuntimeRAWAudioFormat::Float32, SampleRate, NumOfChannels);
}));
return;
}
The Runtime Audio Importer plugin also provides additional features like exporting audio data to a file, passing it to SoundCue, MetaSound, and more. For further details, check out the Runtime Audio Importer documentation.