Skip to main content

Voice activity detection

Streaming Sound Wave, along with its derived types such as Capturable Sound Wave, supports Voice Activity Detection (VAD). VAD filters incoming audio data to populate the internal buffer only when voice is detected. This feature is implemented using libfvad.

To enable VAD after creating the sound wave, use the corresponding function.

Toggle VAD node

Once VAD is enabled, you can set the VAD mode or reset the VAD as needed. These functions should be called only when VAD is enabled.

Set VAD Mode node Reset VAD node

Speech Start and End detection

Voice Activity Detection not only detects the presence of speech, but it also allows for detection of the start and end of speech activity. This is useful for triggering events when speech begins or ends during playback or capture.

You can customize the sensitivity of speech start and end detection by adjusting parameters such as the minimum speech duration and the silence duration. These parameters help to fine-tune the detection to avoid false positives, like picking up brief noises or too-short pauses between speech.

Minimum Speech Duration

The Minimum Speech Duration parameter sets the minimum amount of continuous voice activity required to trigger a speech start event. This helps filter out brief noises that shouldn't be considered speech, to make sure that only sustained voice activity is recognized. The default value for Minimum Speech Duration is 300 milliseconds.

Set Minimum Speech Duration node

Silence Duration

The Silence Duration parameter sets the duration of silence required to trigger a speech end event. This prevents speech detection from ending prematurely during natural pauses between words or sentences. The default value for Silence Duration is 500 milliseconds.

Set Silence Duration node

Binding to Speech Delegates

You can bind to specific delegates when speech starts or ends. This is useful for triggering custom behavior based on speech activity, such as starting or stopping text recognition, or adjusting the volume of other audio sources.

Bind Event To On Speech Started Bind Event To On Speech Ended