Voice activity detection
Streaming Sound Wave, along with its derived types such as Capturable Sound Wave, supports Voice Activity Detection (VAD). VAD filters incoming audio data to populate the internal buffer only when voice is detected. This feature is implemented using libfvad.
To enable VAD after creating the sound wave, use the corresponding function.
- Blueprint
- C++
// Assuming StreamingSoundWave is a UE reference to a UStreamingSoundWave object (or its derived type, such as UCapturableSoundWave)
StreamingSoundWave->ToggleVAD(true);
Once VAD is enabled, you can set the VAD mode or reset the VAD as needed. These functions should be called only when VAD is enabled.
- Blueprint
- C++
// Assuming StreamingSoundWave is a UE reference to a UStreamingSoundWave object (or its derived type, such as UCapturableSoundWave)
// Set the VAD mode
StreamingSoundWave->SetVADMode(ERuntimeVADMode::VeryAggressive);
// Reset the VAD
StreamingSoundWave->ResetVAD();
Speech Start and End detection
Voice Activity Detection not only detects the presence of speech, but it also allows for detection of the start and end of speech activity. This is useful for triggering events when speech begins or ends during playback or capture.
You can customize the sensitivity of speech start and end detection by adjusting parameters such as the minimum speech duration and the silence duration. These parameters help to fine-tune the detection to avoid false positives, like picking up brief noises or too-short pauses between speech.
Minimum Speech Duration
The Minimum Speech Duration parameter sets the minimum amount of continuous voice activity required to trigger a speech start event. This helps filter out brief noises that shouldn't be considered speech, to make sure that only sustained voice activity is recognized. The default value for Minimum Speech Duration is 300 milliseconds.
- Blueprint
- C++
// Assuming StreamingSoundWave is a UE reference to a UStreamingSoundWave object (or its derived type, such as UCapturableSoundWave)
// Set the minimum speech duration
StreamingSoundWave->SetMinimumSpeechDuration(200);
Silence Duration
The Silence Duration parameter sets the duration of silence required to trigger a speech end event. This prevents speech detection from ending prematurely during natural pauses between words or sentences. The default value for Silence Duration is 500 milliseconds.
- Blueprint
- C++
// Assuming StreamingSoundWave is a UE reference to a UStreamingSoundWave object (or its derived type, such as UCapturableSoundWave)
// Set the silence duration
StreamingSoundWave->SetSilenceDuration(700);
Binding to Speech Delegates
You can bind to specific delegates when speech starts or ends. This is useful for triggering custom behavior based on speech activity, such as starting or stopping text recognition, or adjusting the volume of other audio sources.
- Blueprint
- C++
// Assuming StreamingSoundWave is a UE reference to a UStreamingSoundWave object (or its derived type, such as UCapturableSoundWave)
// Bind to the OnSpeechStartedNative delegate
StreamingSoundWave->OnSpeechStartedNative.AddWeakLambda(this, [this]()
{
// Handle the result when speech starts
});
// Bind to the OnSpeechEndedNative delegate
StreamingSoundWave->OnSpeechEndedNative.AddWeakLambda(this, [this]()
{
// Handle the result when speech ends
});