Skip to main content

How to use the plugin

The Runtime Speech Recognizer plugin is designed to recognize words from incoming audio data. It uses a slightly modified version of whisper.cpp to work with the engine. To use the plugin, follow these steps:

Editor side

  1. Select the appropriate language models for your project as described here.

Runtime side

  1. Create a Speech Recognizer and set the necessary parameters (CreateSpeechRecognizer, for parameters see here).
  2. Bind to the needed delegates (OnRecognitionFinished, OnRecognizedTextSegment and OnRecognitionError).
  3. Start the speech recognition (StartSpeechRecognition).
  4. Process audio data and wait for results from the delegates (ProcessAudioData).
  5. Stop the speech recognizer when needed (e.g., after the OnRecognitionFinished broadcast).

The plugin supports incoming audio in the floating point 32-bit interleaved PCM format. While it works well with the Runtime Audio Importer, it doesn't directly depend on it.

Recognition parameters

The plugin supports both streaming and non-streaming audio data recognition. To adjust recognition parameters for your specific use case, call SetStreamingDefaults or SetNonStreamingDefaults. Additionally, you have the flexibility to manually set individual parameters such as the number of threads, step size, whether to translate incoming language to English, and whether to use past transcription. Refer to the Recognition Parameter List for a complete list of available parameters.

Improving performance

Please refer to the How to improve performance section for tips on how to optimize the performance of the plugin.

Voice Activity Detection (VAD)

When processing audio input, especially in streaming scenarios, it's recommended to use Voice Activity Detection (VAD) to filter out empty or noise-only audio segments before they reach the recognizer. This filtering can be enabled on the capturable sound wave side using the Runtime Audio Importer plugin, which helps prevent the language models from hallucinating - attempting to find patterns in noise and generating incorrect transcriptions. For detailed instructions on VAD configuration, refer to the Voice Activity Detection documentation.

In the demo project included with the plugin, VAD is enabled by default. You can find more information about the demo implementation at Demo Project.

Examples

There is a good project demo included in the plugin's Content -> Demo folder, which you can use as an example for implementation.

These examples illustrate how to use the Runtime Speech Recognizer plugin with both streaming and non-streaming audio input, using the Runtime Audio Importer to obtain audio data as an example. Please note that separate downloading of the RuntimeAudioImporter is required to access the same set of audio importing features showcased in the examples (e.g. capturable sound wave and ImportAudioFromFile). These examples are solely intended to illustrate the core concept and do not include error handling.

Streaming audio input

This example captures audio data from the microphone as a stream using the Capturable sound wave and passes it to the speech recognizer. Copyable nodes.

Note: In UE 5.3 and potentially in other versions, you might encounter a situation where certain nodes you've copied are missing in your Blueprints. This can happen due to differences in how nodes are serialized/deserialized between different engine versions. To make sure everything functions correctly, double-check that all nodes are properly connected.

More extensive example

Non-streaming audio input

This example imports audio data to the Imported sound wave and recognizes the full audio data once it has been imported. Copyable nodes.