كيفية استخدام الملحق

يقوم ملحق Runtime Text To Speech بتحويل النص إلى كلام باستخدام نماذج صوتية قابلة للتنزيل. تتم إدارة هذه النماذج في إعدادات الملحق داخل المحرر، يتم تنزيلها وتعبئتها للاستخدام أثناء التشغيل. اتبع الخطوات أدناه للبدء.

جانب المحرر

قم بتنزيل نماذج الصوت المناسبة لمشروعك كما هو موضح هنا. يمكنك تنزيل عدة نماذج صوتية في نفس الوقت.

جانب التشغيل

قم بإنشاء المزج الصوتي باستخدام دالة CreateRuntimeTextToSpeech. تأكد من الحفاظ على مرجع له (مثل متغير منفصل في Blueprints أو UPROPERTY في ++C) لمنعه من جمع القمامة.

Blueprint
C++

مثال لإنشاء مزج صوتي Runtime Text To Speech في Blueprints

// Create the Runtime Text To Speech synthesizer in C++
URuntimeTextToSpeech* Synthesizer = URuntimeTextToSpeech::CreateRuntimeTextToSpeech();
// Ensure the synthesizer is referenced correctly to prevent garbage collection (e.g. as a UPROPERTY)

توليف الكلام

يقدم البرنامج المساعد وضعين لتوليف النص إلى كلام:

تحويل النص إلى كلام العادي: يقوم بتوليف النص بالكامل وإرجاع الصوت المكتمل عند الانتهاء
تحويل النص إلى كلام المتدفق: يوفر مقاطع صوتية أثناء توليدها، مما يسمح بالمعالجة في الوقت الفعلي

يدعم كل وضع طريقتين لاختيار نماذج الصوت:

بالاسم: حدد نموذج صوت باسمه (موصى به لـ UE 5.4+)
بالكائن: حدد نموذج صوت بالإشارة المباشرة (موصى به لـ UE 5.3 والإصدارات الأقدم)

تحويل النص إلى كلام العادي

بالاسم

Blueprint
C++

تكون وظيفة Text To Speech (By Name) أكثر ملاءمة في Blueprints بدءًا من UE 5.4. تتيح لك اختيار نماذج الصوت من قائمة منسدلة للنماذج التي تم تنزيلها. في إصدارات UE الأقدم من 5.3، لا تظهر هذه القائمة المنسدلة، لذا إذا كنت تستخدم إصدارًا أقدم، فستحتاج إلى التكرار يدويًا عبر مصفوفة نماذج الصوت التي تم إرجاعها بواسطة GetDownloadedVoiceModels لتحديد النموذج الذي تحتاجه.

مثال لاستخدام Text To Speech بالاسم في Blueprints

في C++، يمكن أن يكون اختيار نماذج الصوت أكثر تعقيدًا بعض الشيء بسبب عدم وجود قائمة منسدلة. يمكنك استخدام وظيفة GetDownloadedVoiceModelNames لاسترداد أسماء نماذج الصوت التي تم تنزيلها وتحديد النموذج الذي تحتاجه. بعد ذلك، يمكنك استدعاء وظيفة TextToSpeechByName لتوليف النص باستخدام اسم نموذج الصوت المحدد.

    // Assuming "Synthesizer" is a valid and referenced URuntimeTextToSpeech object (ensure it is not eligible for garbage collection during the callback)

    TArray<FName> DownloadedVoiceNames = URuntimeTTSLibrary::GetDownloadedVoiceModelNames();

    // If there are downloaded voice models, use the first one to synthesize text, just as an example
    if (DownloadedVoiceNames.Num() > 0)
    {
    	const FName& VoiceName = DownloadedVoiceNames[0]; // Select the first available voice model
    	Synthesizer->TextToSpeechByName(VoiceName, 0, TEXT("Text example 123"), FOnTTSResultDelegateFast::CreateLambda([](URuntimeTextToSpeech* TextToSpeechInstance, bool bSuccess, const TArray<uint8>& AudioData, int32 SampleRate, int32 NumChannels)
    	{
    		UE_LOG(LogTemp, Log, TEXT("TextToSpeech result: %s, AudioData size: %d, SampleRate: %d, NumChannels: %d"), bSuccess ? TEXT("Success") : TEXT("Failed"), AudioData.Num(), SampleRate, NumChannels);
    	}));
    	return;
    }

حسب الكائن

Blueprint
C++

تعمل وظيفة Text To Speech (By Object) عبر جميع إصدارات Unreal Engine ولكنها تعرض نماذج الصوت كقائمة منسدلة من مراجع الأصول، مما يجعلها أقل سهولة. هذه الطريقة مناسبة لـ UE 5.3 والإصدارات الأقدم، أو إذا كان مشروعك يتطلب مرجعًا مباشرًا إلى أصل نموذج صوتي لأي سبب كان.

مثال على استخدام Text To Speech حسب الكائن في Blueprints

إذا كنت قد قمت بتنزيل النماذج ولكن لا يمكنك رؤيتها، افتح القائمة المنسدلة Voice Model، انقر على الإعدادات (أيقونة الترس)، وقم بتمكين كل من Show Plugin Content و Show Engine Content لجعل النماذج مرئية.

في C++، يمكن أن يكون اختيار نماذج الصوت أكثر تعقيدًا قليلاً بسبب عدم وجود قائمة منسدلة. يمكنك استخدام الدالة GetDownloadedVoiceModelNames لاسترداد أسماء نماذج الصوت التي تم تنزيلها وتحديد النموذج الذي تحتاجه. ثم يمكنك استدعاء الدالة GetVoiceModelFromName للحصول على كائن نموذج الصوت وتمريره إلى الدالة TextToSpeechByObject لتحويل النص إلى كلام.

    // Assuming "Synthesizer" is a valid and referenced URuntimeTextToSpeech object (ensure it is not eligible for garbage collection during the callback)
 
    TArray<FName> DownloadedVoiceNames = URuntimeTTSLibrary::GetDownloadedVoiceModelNames();

	// If there are downloaded voice models, use the first one to synthesize text, for example
	if (DownloadedVoiceNames.Num() > 0)
	{
		const FName& VoiceName = DownloadedVoiceNames[0]; // Select the first available voice model
		TSoftObjectPtr<URuntimeTTSModel> VoiceModel;
		if (!URuntimeTTSLibrary::GetVoiceModelFromName(VoiceName, VoiceModel))
		{
			UE_LOG(LogTemp, Error, TEXT("Failed to get voice model from name: %s"), *VoiceName.ToString());
			return;
		}
		Synthesizer->TextToSpeechByObject(VoiceModel, 0, TEXT("Text example 123"), FOnTTSResultDelegateFast::CreateLambda([](URuntimeTextToSpeech* TextToSpeechInstance, bool bSuccess, const TArray<uint8>& AudioData, int32 SampleRate, int32 NumChannels)
		{
			UE_LOG(LogTemp, Log, TEXT("TextToSpeech result: %s, AudioData size: %d, SampleRate: %d, NumChannels: %d"), bSuccess ? TEXT("Success") : TEXT("Failed"), AudioData.Num(), SampleRate, NumChannels);
		}));
		return;
	}

تحويل النص إلى كلام بشكل متدفق

للنصوص الطويلة أو عندما تريد معالجة بيانات الصوت في الوقت الفعلي أثناء توليدها، يمكنك استخدام الإصدارات المتدفقة من وظائف تحويل النص إلى كلام:

Streaming Text To Speech (By Name) (StreamingTextToSpeechByName في ++C)
Streaming Text To Speech (By Object) (StreamingTextToSpeechByObject في ++C)

توفر هذه الوظائف بيانات الصوت على شكل قطع أثناء توليدها، مما يسمح بالمعالجة الفورية دون الانتظار حتى اكتمال التوليد بالكامل. هذا مفيد لتطبيقات مختلفة مثل تشغيل الصوت في الوقت الفعلي، التصور الحي، أو أي سيناريو تحتاج فيه إلى معالجة بيانات الكلام بشكل تدريجي.

التدفق حسب الاسم

Blueprint
C++

تعمل وظيفة Streaming Text To Speech (By Name) بشكل مشابه للنسخة العادية ولكنها توفر الصوت على شكل قطع من خلال المفوض On Speech Chunk.

مثال لاستخدام Streaming Text To Speech by Name في Blueprints

    // Assuming "Synthesizer" is a valid and referenced URuntimeTextToSpeech object
    
    TArray<FName> DownloadedVoiceNames = URuntimeTTSLibrary::GetDownloadedVoiceModelNames();
    
    if (DownloadedVoiceNames.Num() > 0)
    {
        const FName& VoiceName = DownloadedVoiceNames[0]; // Select the first available voice model
        
        Synthesizer->StreamingTextToSpeechByName(
            VoiceName, 
            0, 
            TEXT("This is a long text that will be synthesized in chunks."),
            FOnTTSStreamingChunkDelegateFast::CreateLambda([](URuntimeTextToSpeech* TextToSpeechInstance, const TArray<uint8>& ChunkAudioData, int32 SampleRate, int32 NumOfChannels, bool bIsFinalChunk)
            {
                // Process each chunk of audio data as it becomes available
                UE_LOG(LogTemp, Log, TEXT("Received chunk %d with %d bytes of audio data. Sample rate: %d, Channels: %d, Is Final: %s"), 
                    ChunkIndex, ChunkAudioData.Num(), SampleRate, NumOfChannels, bIsFinalChunk ? TEXT("Yes") : TEXT("No"));
                
                // You can start processing/playing this chunk immediately
            }),
            FOnTTSStreamingCompleteDelegateFast::CreateLambda([](URuntimeTextToSpeech* TextToSpeechInstance, bool bSuccess, const FString& ErrorMessage)
            {
                // Called when the entire synthesis is complete or if it fails
                if (bSuccess)
                {
                    UE_LOG(LogTemp, Log, TEXT("Streaming synthesis completed successfully"));
                }
                else
                {
                    UE_LOG(LogTemp, Error, TEXT("Streaming synthesis failed: %s"), *ErrorMessage);
                }
            })
        );
    }

البث حسب الكائن

Blueprint
C++

توفر دالة Streaming Text To Speech (By Object) نفس وظيفة البث ولكنها تأخذ مرجع كائن نموذج صوتي.

مثال على استخدام Streaming Text To Speech by Object في Blueprints

    // Assuming "Synthesizer" is a valid and referenced URuntimeTextToSpeech object
    
    TArray<FName> DownloadedVoiceNames = URuntimeTTSLibrary::GetDownloadedVoiceModelNames();
    
    if (DownloadedVoiceNames.Num() > 0)
    {
        const FName& VoiceName = DownloadedVoiceNames[0]; // Select the first available voice model
        TSoftObjectPtr<URuntimeTTSModel> VoiceModel;
        
        if (!URuntimeTTSLibrary::GetVoiceModelFromName(VoiceName, VoiceModel))
        {
            UE_LOG(LogTemp, Error, TEXT("Failed to get voice model from name: %s"), *VoiceName.ToString());
            return;
        }
        
        Synthesizer->StreamingTextToSpeechByObject(
            VoiceModel, 
            0, 
            TEXT("This is a long text that will be synthesized in chunks."),
            FOnTTSStreamingChunkDelegateFast::CreateLambda([](URuntimeTextToSpeech* TextToSpeechInstance, const TArray<uint8>& ChunkAudioData, int32 SampleRate, int32 NumOfChannels, bool bIsFinalChunk)
            {
                // Process each chunk of audio data as it becomes available
                UE_LOG(LogTemp, Log, TEXT("Received chunk %d with %d bytes of audio data. Sample rate: %d, Channels: %d, Is Final: %s"), 
                    ChunkIndex, ChunkAudioData.Num(), SampleRate, NumOfChannels, bIsFinalChunk ? TEXT("Yes") : TEXT("No"));
                
                // You can start processing/playing this chunk immediately
            }),
            FOnTTSStreamingCompleteDelegateFast::CreateLambda([](URuntimeTextToSpeech* TextToSpeechInstance, bool bSuccess, const FString& ErrorMessage)
            {
                // Called when the entire synthesis is complete or if it fails
                if (bSuccess)
                {
                    UE_LOG(LogTemp, Log, TEXT("Streaming synthesis completed successfully"));
                }
                else
                {
                    UE_LOG(LogTemp, Error, TEXT("Streaming synthesis failed: %s"), *ErrorMessage);
                }
            })
        );
    }

تشغيل الصوت

التشغيل العادي
تشغيل البث

لتحويل النص إلى كلام العادي (غير المتدفق)، يوفر المفوض On Speech Result الصوت المُركب كبيانات PCM بتنسيق float (كمصفوفة بايت في Blueprints أو TArray<uint8> في C++)، بالإضافة إلى Sample Rate و Num Of Channels.

للتشغيل، يُوصى باستخدام ملحق Runtime Audio Importer لتحويل بيانات الصوت الخام إلى موجة صوت قابلة للتشغيل.

بلو برينت
سي++

إليك مثالًا لكيفية ظهور عقد Blueprint لتركيب النص وتشغيل الصوت (عقد قابلة للنسخ):

إليك مثالًا لكيفية تركيب النص وتشغيل الصوت في C++:

        // Assuming "Synthesizer" is a valid and referenced URuntimeTextToSpeech object (ensure it is not eligible for garbage collection during the callback)
        // Ensure "this" is a valid and referenced UObject (must not be eligible for garbage collection during the callback)
     
        TArray<FName> DownloadedVoiceNames = URuntimeTTSLibrary::GetDownloadedVoiceModelNames();

        // If there are downloaded voice models, use the first one to synthesize text, for example
        if (DownloadedVoiceNames.Num() > 0)
        {
            const FName& VoiceName = DownloadedVoiceNames[0]; // Select the first available voice model
            Synthesizer->TextToSpeechByName(VoiceName, 0, TEXT("Text example 123"), FOnTTSResultDelegateFast::CreateLambda([this](URuntimeTextToSpeech* TextToSpeechInstance, bool bSuccess, const TArray<uint8>& AudioData, int32 SampleRate, int32 NumOfChannels)
            {
                if (!bSuccess)
                {
                    UE_LOG(LogTemp, Error, TEXT("TextToSpeech failed"));
                    return;
                }

                // Create the Runtime Audio Importer to process the audio data
                URuntimeAudioImporterLibrary* RuntimeAudioImporter = URuntimeAudioImporterLibrary::CreateRuntimeAudioImporter();

                // Prevent the RuntimeAudioImporter from being garbage collected by adding it to the root (you can also use a UPROPERTY, TStrongObjectPtr, etc.)
                RuntimeAudioImporter->AddToRoot();

                RuntimeAudioImporter->OnResultNative.AddWeakLambda(RuntimeAudioImporter, [this](URuntimeAudioImporterLibrary* Importer, UImportedSoundWave* ImportedSoundWave, ERuntimeImportStatus Status)
                {
                    // Once done, remove it from the root to allow garbage collection
                    Importer->RemoveFromRoot();
                    
                    if (Status != ERuntimeImportStatus::SuccessfulImport)
                    {
                        UE_LOG(LogTemp, Error, TEXT("Failed to import audio, status: %s"), *UEnum::GetValueAsString(Status));
                        return;
                    }

                    // Play the imported sound wave (ensure a reference is kept to prevent garbage collection)
                    UGameplayStatics::PlaySound2D(GetWorld(), ImportedSoundWave);
                });

                RuntimeAudioImporter->ImportAudioFromRAWBuffer(AudioData, ERuntimeRAWAudioFormat::Float32, SampleRate, NumOfChannels);
            }));
            return;
        }

لبث تحويل النص إلى كلام، ستتلقى بيانات الصوت على شكل قطع كبيانات PCM بتنسيق float (كمصفوفة بايت في Blueprints أو TArray<uint8> في ++C)، بالإضافة إلى معدل العينة وعدد القنوات. يمكن معالجة كل قطعة فور توفرها.

للتشغيل في الوقت الفعلي، يُوصى باستخدام مكون Runtime Audio Importer الإضافي Streaming Sound Wave، المصمم خصيصًا لتشغيل الصوت عبر البث أو المعالجة في الوقت الفعلي.

بلوبرينت
++C

إليك مثالًا لكيفية ظهور عقد Blueprint لبث تحويل النص إلى كلام وتشغيل الصوت (عقد قابلة للنسخ):

مثال لبث TTS مع تشغيل الصوت في Blueprints

إليك مثالًا لكيفية تنفيذ بث تحويل النص إلى كلام مع تشغيل في الوقت الفعلي بلغة ++C:

        UPROPERTY()
        URuntimeTextToSpeech* Synthesizer;

        UPROPERTY()
        UStreamingSoundWave* StreamingSoundWave;

        UPROPERTY()
        bool bIsPlaying = false;

        void StartStreamingTTS()
        {
            // Create synthesizer if not already created
            if (!Synthesizer)
            {
                Synthesizer = URuntimeTextToSpeech::CreateRuntimeTextToSpeech();
            }

            // Create a sound wave for streaming if not already created
            if (!StreamingSoundWave)
            {
                StreamingSoundWave = UStreamingSoundWave::CreateStreamingSoundWave();
                StreamingSoundWave->OnPopulateAudioStateNative.AddWeakLambda(this, [this]()
                {
                    if (!bIsPlaying)
                    {
                        bIsPlaying = true;
                        UGameplayStatics::PlaySound2D(GetWorld(), StreamingSoundWave);
                    }
                });
            }

            TArray<FName> DownloadedVoiceNames = URuntimeTTSLibrary::GetDownloadedVoiceModelNames();

            // If there are downloaded voice models, use the first one to synthesize text, for example
            if (DownloadedVoiceNames.Num() > 0)
            {
                const FName& VoiceName = DownloadedVoiceNames[0]; // Select the first available voice model

                Synthesizer->StreamingTextToSpeechByName(
                    VoiceName,
                    0,
                    TEXT("Streaming synthesis output begins with a steady flow of data. This data is processed in real-time to ensure consistency. As the process continues, information is streamed without interruption. The output adapts seamlessly to changing inputs. Each piece of data is instantly integrated into the stream. Real-time processing allows for immediate adjustments. This constant flow ensures that the synthesis output is dynamic. As new data comes in, the output evolves accordingly. The system is designed to maintain a continuous output stream. This uninterrupted flow is what drives the efficiency of streaming synthesis."),
                    FOnTTSStreamingChunkDelegateFast::CreateWeakLambda(this, [this](URuntimeTextToSpeech* TextToSpeechInstance, const TArray<uint8>& ChunkAudioData, int32 SampleRate, int32 NumOfChannels, bool bIsFinalChunk)
                    {
                        StreamingSoundWave->AppendAudioDataFromRAW(ChunkAudioData, ERuntimeRAWAudioFormat::Float32, SampleRate, NumOfChannels);
                    }),
                    FOnTTSStreamingCompleteDelegateFast::CreateWeakLambda(this, [this](URuntimeTextToSpeech* TextToSpeechInstance, bool bSuccess, const FString& ErrorMessage)
                    {
                        if (bSuccess)
                        {
                            UE_LOG(LogTemp, Log, TEXT("Streaming text-to-speech synthesis is complete"));
                        }
                        else
                        {
                            UE_LOG(LogTemp, Error, TEXT("Streaming synthesis failed: %s"), *ErrorMessage);
                        }
                    })
                );
            }
        }

إلغاء تحويل النص إلى كلام

يمكنك إلغاء عملية تحويل النص إلى كلام الجارية في أي وقت عن طريق استدعاء الدالة CancelSpeechSynthesis على مثيل المولّف الخاص بك:

Blueprint
C++

إلغاء تحويل النص إلى كلام في Blueprints

    // Assuming "Synthesizer" is a valid URuntimeTextToSpeech instance
    
    // Start a long synthesis operation
    Synthesizer->TextToSpeechByName(VoiceName, 0, TEXT("Very long text..."), ...);
    
    // Later, if you need to cancel it:
    bool bWasCancelled = Synthesizer->CancelSpeechSynthesis();
    
    if (bWasCancelled)
    {
        UE_LOG(LogTemp, Log, TEXT("Successfully cancelled ongoing synthesis"));
    }
    else
    {
        UE_LOG(LogTemp, Log, TEXT("No synthesis was in progress to cancel"));
    }

عند إلغاء عملية التوليف:

ستتوقف عملية التوليف في أسرع وقت ممكن
سيتم إنهاء أي استدعاءات جارية
سيتم استدعاء مندوب الإكمال مع bSuccess = false ورسالة خطأ تشير إلى إلغاء التوليف
سيتم تنظيف أي موارد مخصصة للتوليف بشكل صحيح

هذا مفيد بشكل خاص للنصوص الطويلة أو عندما تحتاج إلى مقاطعة التشغيل لبدء توليف جديد.

اختيار المتحدث

تقبل وظيفتا Text To Speech معلمة اختيارية لمعرف المتحدث، وهي مفيدة عند العمل مع نماذج الأصوات التي تدعم متحدثين متعددين. يمكنك استخدام وظائف GetSpeakerCountFromVoiceModel أو GetSpeakerCountFromModelName للتحقق مما إذا كان نموذج الصوت المختار يدعم متحدثين متعددين. إذا توفر متحدثون متعددون، ما عليك سوى تحديد معرف المتحدث المطلوب عند استدعاء وظائف Text To Speech. بعض نماذج الأصوات توفر تنوعًا واسعًا - على سبيل المثال، English LibriTTS يتضمن أكثر من 900 متحدث مختلف للاختيار من بينهم.

يوفر ملحق Runtime Audio Importer أيضًا ميزات إضافية مثل تصدير بيانات الصوت إلى ملف، أو تمريرها إلى SoundCue وMetaSound وغير ذلك. لمزيد من التفاصيل، راجع وثائق Runtime Audio Importer.

جانب المحرر​

جانب التشغيل​

توليف الكلام​

تحويل النص إلى كلام العادي​

بالاسم​

حسب الكائن​

تحويل النص إلى كلام بشكل متدفق​

التدفق حسب الاسم​

البث حسب الكائن​

تشغيل الصوت​

إلغاء تحويل النص إلى كلام​

اختيار المتحدث​