如何使用该插件
Runtime Text To Speech 插件通过可下载的语音模型将文本合成为语音。这些模型在编辑器内的插件设置中进行管理、下载并打包以供运行时使用。请按照以下步骤开始使用。
编辑器端
根据此处的说明下载适合您项目的语音模型。您可以同时下载多个语音模型。
运行时端
使用 CreateRuntimeTextToSpeech
函数创建合成器。请确保保持对它的引用(例如在 Blueprints 中作为单独的变量或在 C++ 中使用 UPROPERTY),以防止其被垃圾回收。
- Blueprint
- C++
// Create the Runtime Text To Speech synthesizer in C++
URuntimeTextToSpeech* Synthesizer = URuntimeTextToSpeech::CreateRuntimeTextToSpeech();
// Ensure the synthesizer is referenced correctly to prevent garbage collection (e.g. as a UPROPERTY)
语音合成
该插件提供两种文本转语音合成模式:
- 常规文本转语音:合成完整文本并在完成后返回全部音频
- 流式文本转语音:实时生成音频片段,支持流式处理
每种模式支持两种选择语音模型的方法:
- 按名称选择:通过语音模型名称选择(推荐用于UE 5.4+)
- 按对象引用:直接引用语音模型对象(推荐用于UE 5.3及更早版本)
常规文本转语音
按名称选择
- 蓝图
- C++
从UE 5.4开始,Text To Speech (By Name)
函数在蓝图中使用更为便捷。它允许您从已下载模型的下拉列表中选择语音模型。在UE 5.3及以下版本中,此下拉列表不会显示,因此如果您使用的是旧版本,需要手动遍历GetDownloadedVoiceModels
返回的语音模型数组来选择所需模型。
在C++中,由于没有下拉列表,选择语音模型可能稍显复杂。您可以使用GetDownloadedVoiceModelNames
函数获取已下载语音模型的名称并进行选择。之后可以调用TextToSpeechByName
函数,使用选定的语音模型名称来合成文本。
// Assuming "Synthesizer" is a valid and referenced URuntimeTextToSpeech object (ensure it is not eligible for garbage collection during the callback)
TArray<FName> DownloadedVoiceNames = URuntimeTTSLibrary::GetDownloadedVoiceModelNames();
// If there are downloaded voice models, use the first one to synthesize text, just as an example
if (DownloadedVoiceNames.Num() > 0)
{
const FName& VoiceName = DownloadedVoiceNames[0]; // Select the first available voice model
Synthesizer->TextToSpeechByName(VoiceName, 0, TEXT("Text example 123"), FOnTTSResultDelegateFast::CreateLambda([](URuntimeTextToSpeech* TextToSpeechInstance, bool bSuccess, const TArray<uint8>& AudioData, int32 SampleRate, int32 NumChannels)
{
UE_LOG(LogTemp, Log, TEXT("TextToSpeech result: %s, AudioData size: %d, SampleRate: %d, NumChannels: %d"), bSuccess ? TEXT("Success") : TEXT("Failed"), AudioData.Num(), SampleRate, NumChannels);
}));
return;
}
按对象调用
- 蓝图
- C++
Text To Speech (By Object)
函数在所有版本的虚幻引擎中均可使用,但会将语音模型显示为资源引用下拉列表,这种方式不够直观。此方法适用于 UE 5.3 及更早版本,或者您的项目因某些原因需要直接引用语音模型资源。
如果您已下载模型但无法看到它们,请打开 Voice Model 下拉菜单,点击设置(齿轮图标),并启用 Show Plugin Content 和 Show Engine Content 以使模型可见。
在 C++ 中,由于没有下拉列表,选择语音模型可能稍微复杂一些。您可以使用 GetDownloadedVoiceModelNames
函数获取已下载语音模型的名称,并选择所需的模型。然后,您可以调用 GetVoiceModelFromName
函数获取语音模型对象,并将其传递给 TextToSpeechByObject
函数以合成文本。
// Assuming "Synthesizer" is a valid and referenced URuntimeTextToSpeech object (ensure it is not eligible for garbage collection during the callback)
TArray<FName> DownloadedVoiceNames = URuntimeTTSLibrary::GetDownloadedVoiceModelNames();
// If there are downloaded voice models, use the first one to synthesize text, for example
if (DownloadedVoiceNames.Num() > 0)
{
const FName& VoiceName = DownloadedVoiceNames[0]; // Select the first available voice model
TSoftObjectPtr<URuntimeTTSModel> VoiceModel;
if (!URuntimeTTSLibrary::GetVoiceModelFromName(VoiceName, VoiceModel))
{
UE_LOG(LogTemp, Error, TEXT("Failed to get voice model from name: %s"), *VoiceName.ToString());
return;
}
Synthesizer->TextToSpeechByObject(VoiceModel, 0, TEXT("Text example 123"), FOnTTSResultDelegateFast::CreateLambda([](URuntimeTextToSpeech* TextToSpeechInstance, bool bSuccess, const TArray<uint8>& AudioData, int32 SampleRate, int32 NumChannels)
{
UE_LOG(LogTemp, Log, TEXT("TextToSpeech result: %s, AudioData size: %d, SampleRate: %d, NumChannels: %d"), bSuccess ? TEXT("Success") : TEXT("Failed"), AudioData.Num(), SampleRate, NumChannels);
}));
return;
}
流式文本转语音
对于较长文本或需要实时处理生成音频数据的场景,您可以使用以下流式版本的文本转语音功能:
Streaming Text To Speech (By Name)
(C++中为StreamingTextToSpeechByName
)Streaming Text To Speech (By Object)
(C++中为StreamingTextToSpeechByObject
)
这些函数会在音频数据生成时以分块形式提供,允许立即处理而无需等待整个合成过程完成。这对于实时音频播放、实时可视化等需要增量处理语音数据的应用场景非常有用。
通过名称流式处理
- 蓝图
- C++
Streaming Text To Speech (By Name)
函数与常规版本类似,但会通过On Speech Chunk
委托以分块形式提供音频数据。
// Assuming "Synthesizer" is a valid and referenced URuntimeTextToSpeech object
TArray<FName> DownloadedVoiceNames = URuntimeTTSLibrary::GetDownloadedVoiceModelNames();
if (DownloadedVoiceNames.Num() > 0)
{
const FName& VoiceName = DownloadedVoiceNames[0]; // Select the first available voice model
Synthesizer->StreamingTextToSpeechByName(
VoiceName,
0,
TEXT("This is a long text that will be synthesized in chunks."),
FOnTTSStreamingChunkDelegateFast::CreateLambda([](URuntimeTextToSpeech* TextToSpeechInstance, const TArray<uint8>& ChunkAudioData, int32 SampleRate, int32 NumOfChannels, bool bIsFinalChunk)
{
// Process each chunk of audio data as it becomes available
UE_LOG(LogTemp, Log, TEXT("Received chunk %d with %d bytes of audio data. Sample rate: %d, Channels: %d, Is Final: %s"),
ChunkIndex, ChunkAudioData.Num(), SampleRate, NumOfChannels, bIsFinalChunk ? TEXT("Yes") : TEXT("No"));
// You can start processing/playing this chunk immediately
}),
FOnTTSStreamingCompleteDelegateFast::CreateLambda([](URuntimeTextToSpeech* TextToSpeechInstance, bool bSuccess, const FString& ErrorMessage)
{
// Called when the entire synthesis is complete or if it fails
if (bSuccess)
{
UE_LOG(LogTemp, Log, TEXT("Streaming synthesis completed successfully"));
}
else
{
UE_LOG(LogTemp, Error, TEXT("Streaming synthesis failed: %s"), *ErrorMessage);
}
})
);
}
按对象流式传输
- 蓝图
- C++
Streaming Text To Speech (By Object)
函数提供相同的流式功能,但需要传入语音模型对象引用。
// Assuming "Synthesizer" is a valid and referenced URuntimeTextToSpeech object
TArray<FName> DownloadedVoiceNames = URuntimeTTSLibrary::GetDownloadedVoiceModelNames();
if (DownloadedVoiceNames.Num() > 0)
{
const FName& VoiceName = DownloadedVoiceNames[0]; // Select the first available voice model
TSoftObjectPtr<URuntimeTTSModel> VoiceModel;
if (!URuntimeTTSLibrary::GetVoiceModelFromName(VoiceName, VoiceModel))
{
UE_LOG(LogTemp, Error, TEXT("Failed to get voice model from name: %s"), *VoiceName.ToString());
return;
}
Synthesizer->StreamingTextToSpeechByObject(
VoiceModel,
0,
TEXT("This is a long text that will be synthesized in chunks."),
FOnTTSStreamingChunkDelegateFast::CreateLambda([](URuntimeTextToSpeech* TextToSpeechInstance, const TArray<uint8>& ChunkAudioData, int32 SampleRate, int32 NumOfChannels, bool bIsFinalChunk)
{
// Process each chunk of audio data as it becomes available
UE_LOG(LogTemp, Log, TEXT("Received chunk %d with %d bytes of audio data. Sample rate: %d, Channels: %d, Is Final: %s"),
ChunkIndex, ChunkAudioData.Num(), SampleRate, NumOfChannels, bIsFinalChunk ? TEXT("Yes") : TEXT("No"));
// You can start processing/playing this chunk immediately
}),
FOnTTSStreamingCompleteDelegateFast::CreateLambda([](URuntimeTextToSpeech* TextToSpeechInstance, bool bSuccess, const FString& ErrorMessage)
{
// Called when the entire synthesis is complete or if it fails
if (bSuccess)
{
UE_LOG(LogTemp, Log, TEXT("Streaming synthesis completed successfully"));
}
else
{
UE_LOG(LogTemp, Error, TEXT("Streaming synthesis failed: %s"), *ErrorMessage);
}
})
);
}
音频播放
- 常规播放
- 流式播放
对于常规(非流式)文本转语音,On Speech Result
委托会提供合成的音频作为 PCM 数据(浮点格式,在 Blueprints 中以字节数组形式呈现,在 C++ 中以 TArray<uint8>
形式呈现),同时包含 Sample Rate
(采样率)和 Num Of Channels
(通道数)。
对于播放,推荐使用 Runtime Audio Importer 插件将原始音频数据转换为可播放的声音波形。
- 蓝图
- C++
以下是合成文本并播放音频的 Blueprint 节点示例(可复制节点):
以下是在 C++ 中合成文本并播放音频的示例:
// Assuming "Synthesizer" is a valid and referenced URuntimeTextToSpeech object (ensure it is not eligible for garbage collection during the callback)
// Ensure "this" is a valid and referenced UObject (must not be eligible for garbage collection during the callback)
TArray<FName> DownloadedVoiceNames = URuntimeTTSLibrary::GetDownloadedVoiceModelNames();
// If there are downloaded voice models, use the first one to synthesize text, for example
if (DownloadedVoiceNames.Num() > 0)
{
const FName& VoiceName = DownloadedVoiceNames[0]; // Select the first available voice model
Synthesizer->TextToSpeechByName(VoiceName, 0, TEXT("Text example 123"), FOnTTSResultDelegateFast::CreateLambda([this](URuntimeTextToSpeech* TextToSpeechInstance, bool bSuccess, const TArray<uint8>& AudioData, int32 SampleRate, int32 NumOfChannels)
{
if (!bSuccess)
{
UE_LOG(LogTemp, Error, TEXT("TextToSpeech failed"));
return;
}
// Create the Runtime Audio Importer to process the audio data
URuntimeAudioImporterLibrary* RuntimeAudioImporter = URuntimeAudioImporterLibrary::CreateRuntimeAudioImporter();
// Prevent the RuntimeAudioImporter from being garbage collected by adding it to the root (you can also use a UPROPERTY, TStrongObjectPtr, etc.)
RuntimeAudioImporter->AddToRoot();
RuntimeAudioImporter->OnResultNative.AddWeakLambda(RuntimeAudioImporter, [this](URuntimeAudioImporterLibrary* Importer, UImportedSoundWave* ImportedSoundWave, ERuntimeImportStatus Status)
{
// Once done, remove it from the root to allow garbage collection
Importer->RemoveFromRoot();
if (Status != ERuntimeImportStatus::SuccessfulImport)
{
UE_LOG(LogTemp, Error, TEXT("Failed to import audio, status: %s"), *UEnum::GetValueAsString(Status));
return;
}
// Play the imported sound wave (ensure a reference is kept to prevent garbage collection)
UGameplayStatics::PlaySound2D(GetWorld(), ImportedSoundWave);
});
RuntimeAudioImporter->ImportAudioFromRAWBuffer(AudioData, ERuntimeRAWAudioFormat::Float32, SampleRate, NumOfChannels);
}));
return;
}
对于流式文本转语音,您将收到分块的音频数据,格式为浮点型PCM数据(在Blueprints中作为字节数组,或在C++中作为TArray<uint8>
),同时附带采样率
和声道数
。每个数据块在可用时可立即处理。
要实现实时播放,建议使用Runtime Audio Importer插件的Streaming Sound Wave,该组件专为流式音频播放或实时处理而设计。
- 蓝图
- C++
以下是流式文本转语音及播放音频的Blueprint节点示例(可复制节点):
以下是在C++中实现流式文本转语音实时播放的示例:
UPROPERTY()
URuntimeTextToSpeech* Synthesizer;
UPROPERTY()
UStreamingSoundWave* StreamingSoundWave;
UPROPERTY()
bool bIsPlaying = false;
void StartStreamingTTS()
{
// Create synthesizer if not already created
if (!Synthesizer)
{
Synthesizer = URuntimeTextToSpeech::CreateRuntimeTextToSpeech();
}
// Create a sound wave for streaming if not already created
if (!StreamingSoundWave)
{
StreamingSoundWave = UStreamingSoundWave::CreateStreamingSoundWave();
StreamingSoundWave->OnPopulateAudioStateNative.AddWeakLambda(this, [this]()
{
if (!bIsPlaying)
{
bIsPlaying = true;
UGameplayStatics::PlaySound2D(GetWorld(), StreamingSoundWave);
}
});
}
TArray<FName> DownloadedVoiceNames = URuntimeTTSLibrary::GetDownloadedVoiceModelNames();
// If there are downloaded voice models, use the first one to synthesize text, for example
if (DownloadedVoiceNames.Num() > 0)
{
const FName& VoiceName = DownloadedVoiceNames[0]; // Select the first available voice model
Synthesizer->StreamingTextToSpeechByName(
VoiceName,
0,
TEXT("Streaming synthesis output begins with a steady flow of data. This data is processed in real-time to ensure consistency. As the process continues, information is streamed without interruption. The output adapts seamlessly to changing inputs. Each piece of data is instantly integrated into the stream. Real-time processing allows for immediate adjustments. This constant flow ensures that the synthesis output is dynamic. As new data comes in, the output evolves accordingly. The system is designed to maintain a continuous output stream. This uninterrupted flow is what drives the efficiency of streaming synthesis."),
FOnTTSStreamingChunkDelegateFast::CreateWeakLambda(this, [this](URuntimeTextToSpeech* TextToSpeechInstance, const TArray<uint8>& ChunkAudioData, int32 SampleRate, int32 NumOfChannels, bool bIsFinalChunk)
{
StreamingSoundWave->AppendAudioDataFromRAW(ChunkAudioData, ERuntimeRAWAudioFormat::Float32, SampleRate, NumOfChannels);
}),
FOnTTSStreamingCompleteDelegateFast::CreateWeakLambda(this, [this](URuntimeTextToSpeech* TextToSpeechInstance, bool bSuccess, const FString& ErrorMessage)
{
if (bSuccess)
{
UE_LOG(LogTemp, Log, TEXT("Streaming text-to-speech synthesis is complete"));
}
else
{
UE_LOG(LogTemp, Error, TEXT("Streaming synthesis failed: %s"), *ErrorMessage);
}
})
);
}
}
取消文本转语音
您可以通过在合成器实例上调用 CancelSpeechSynthesis
函数随时取消正在进行的文本转语音合成操作:
- Blueprint
- C++
// Assuming "Synthesizer" is a valid URuntimeTextToSpeech instance
// Start a long synthesis operation
Synthesizer->TextToSpeechByName(VoiceName, 0, TEXT("Very long text..."), ...);
// Later, if you need to cancel it:
bool bWasCancelled = Synthesizer->CancelSpeechSynthesis();
if (bWasCancelled)
{
UE_LOG(LogTemp, Log, TEXT("Successfully cancelled ongoing synthesis"));
}
else
{
UE_LOG(LogTemp, Log, TEXT("No synthesis was in progress to cancel"));
}
当合成被取消时:
- 合成过程将尽快停止
- 任何正在执行的回调将被终止
- 完成委托会被调用,参数为
bSuccess = false
并附带表明合成已取消的错误信息 - 为合成分配的所有资源都将被正确清理
这对于长文本或需要中断当前播放以开始新合成的情况特别有用。
说话人选择
两个文本转语音函数都接受可选的说话人ID参数,这在处理支持多说话人的语音模型时非常有用。您可以使用GetSpeakerCountFromVoiceModel或GetSpeakerCountFromModelName函数来检查所选语音模型是否支持多说话人。如果存在多个说话人,只需在调用文本转语音函数时指定所需的说话人ID即可。某些语音模型提供丰富的选择——例如English LibriTTS包含超过900个不同的说话人可供选择。
Runtime Audio Importer插件还提供了其他功能,如将音频数据导出到文件、传递到SoundCue、MetaSound等。更多详情请参阅Runtime Audio Importer文档。