Voice Activity Detection

Streaming Sound Wave, along with its derived types such as Capturable Sound Wave, supports Voice Activity Detection (VAD). VAD filters incoming audio data to populate the internal buffer only when voice is detected.

The plugin offers two VAD implementations:

Default VAD
Silero VAD

The default implementation uses libfvad, a lightweight voice activity detection library that works efficiently across all platforms and engine versions supported by Runtime Audio Importer.

Basic Usage

To enable VAD after creating a sound wave, use the ToggleVAD function:

Blueprint
C++

Toggle VAD node

// Assuming StreamingSoundWave is a UE reference to a UStreamingSoundWave object (or its derived type, such as UCapturableSoundWave)
StreamingSoundWave->ToggleVAD(true);

After enabling VAD, you can reset it at any time:

Blueprint
C++

Reset VAD node

// Reset the VAD
StreamingSoundWave->ResetVAD();

Default VAD Settings

When using the default VAD provider, you can adjust its aggressiveness by changing the VAD mode:

Blueprint
C++

Set VAD Mode node

// Set the VAD mode (only works with the default VAD provider)
StreamingSoundWave->SetVADMode(ERuntimeVADMode::VeryAggressive);

The mode parameter controls how aggressively the VAD filters audio. Higher values are more restrictive, meaning they're less likely to report false positives but might miss some speech.

VAD Providers

After enabling VAD with the ToggleVAD function, you can choose between different Voice Activity Detection providers to suit your needs. The default provider is built-in, while additional providers such as Silero VAD are available through extension plugins.

Blueprint
C++

Set VAD Provider node

// Assuming StreamingSoundWave is a UE reference to a UStreamingSoundWave object (or its derived type, such as UCapturableSoundWave)
// Make sure to call ToggleVAD(true) before setting the provider

// Set the VAD provider to Silero VAD
StreamingSoundWave->SetVADProvider(URuntimeSileroVADProvider::StaticClass());

Silero VAD Extension

Silero VAD provides more accurate speech detection using neural networks. To use it:

Ensure the Runtime Audio Importer plugin is already installed in your project
Download the Silero VAD extension plugin from Google Drive
Extract the folder from the downloaded archive into the Plugins folder of your project (create this folder if it doesn't exist)
Rebuild your project (this extension requires a C++ project)

important

The default VAD works with all engine versions supported by Runtime Audio Importer (UE 4.24, 4.25, 4.26, 4.27, 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, and 5.6)
Silero VAD supports Unreal Engine 4.27 and all UE5 versions (4.27, 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, and 5.6)
Silero VAD is currently available for Windows only
This extension is provided as source code and requires a C++ project to use
For more information on how to build plugins manually, see the Building Plugins tutorial

Once installed, you can select it as your VAD provider using the SetVADProvider function with Silero class provider.

Speech Start and End detection

Voice Activity Detection not only detects the presence of speech, but it also allows for detection of the start and end of speech activity. This is useful for triggering events when speech begins or ends during playback or capture.

You can customize the sensitivity of speech start and end detection by adjusting parameters such as the minimum speech duration and the silence duration. These parameters help to fine-tune the detection to avoid false positives, like picking up brief noises or too-short pauses between speech.

Minimum Speech Duration

The Minimum Speech Duration parameter sets the minimum amount of continuous voice activity required to trigger a speech start event. This helps filter out brief noises that shouldn't be considered speech, to make sure that only sustained voice activity is recognized. The default value for Minimum Speech Duration is 300 milliseconds.

Blueprint
C++

Set Minimum Speech Duration node

// Assuming StreamingSoundWave is a UE reference to a UStreamingSoundWave object (or its derived type, such as UCapturableSoundWave)

// Set the minimum speech duration
StreamingSoundWave->SetMinimumSpeechDuration(200);

Silence Duration

The Silence Duration parameter sets the duration of silence required to trigger a speech end event. This prevents speech detection from ending prematurely during natural pauses between words or sentences. The default value for Silence Duration is 500 milliseconds.

Blueprint
C++

Set Silence Duration node

// Assuming StreamingSoundWave is a UE reference to a UStreamingSoundWave object (or its derived type, such as UCapturableSoundWave)

// Set the silence duration
StreamingSoundWave->SetSilenceDuration(700);

Binding to Speech Delegates

You can bind to specific delegates when speech starts or ends. This is useful for triggering custom behavior based on speech activity, such as starting or stopping text recognition, or adjusting the volume of other audio sources.

Blueprint
C++

Bind Event To On Speech Started Bind Event To On Speech Ended

// Assuming StreamingSoundWave is a UE reference to a UStreamingSoundWave object (or its derived type, such as UCapturableSoundWave)

// Bind to the OnSpeechStartedNative delegate
StreamingSoundWave->OnSpeechStartedNative.AddWeakLambda(this, [this]()
{
 // Handle the result when speech starts
});

// Bind to the OnSpeechEndedNative delegate
StreamingSoundWave->OnSpeechEndedNative.AddWeakLambda(this, [this]()
{
 // Handle the result when speech ends
});

Comparing VAD Providers

Default VAD
Silero VAD

Default VAD (libfvad)

Advantages:

Lightweight and efficient
Works on all platforms
Minimal resource usage
Suitable for mobile and low-powered devices

Best for:

Simple voice detection in quiet environments
Mobile applications
Projects where performance is a priority
When universal platform support is required

Basic Usage​

Default VAD Settings​

VAD Providers​

Silero VAD Extension​

Speech Start and End detection​

Minimum Speech Duration​

Silence Duration​

Binding to Speech Delegates​

Comparing VAD Providers​

Default VAD (libfvad)​

Silero VAD​