How to use the plugin
The Runtime Speech Recognizer plugin is designed to recognize words from incoming audio data. It uses a slightly modified version of whisper.cpp to work with the engine. To use the plugin, follow these steps:
Editor side
- Select the appropriate language models for your project as described here.
Runtime side
- Create a Speech Recognizer and set the necessary parameters (CreateSpeechRecognizer, for parameters see here).
- Bind to the needed delegates (OnRecognitionFinished, OnRecognizedTextSegment and OnRecognitionError).
- Start the speech recognition (StartSpeechRecognition).
- Process audio data and wait for results from the delegates (ProcessAudioData).
- Stop the speech recognizer when needed (e.g., after the OnRecognitionFinished broadcast).
The plugin supports incoming audio in the floating point 32-bit interleaved PCM format. While it works well with the Runtime Audio Importer, it doesn't directly depend on it.
Recognition parameters
The plugin supports both streaming and non-streaming audio data recognition. To adjust recognition parameters for your specific use case, call SetStreamingDefaults
or SetNonStreamingDefaults
. Additionally, you have the flexibility to manually set individual parameters such as the number of threads, step size, whether to translate incoming language to English, and whether to use past transcription. Refer to the Recognition Parameter List for a complete list of available parameters.
Improving performance
To improve performance, consider increasing the number of threads. For example, you can set the number of threads to 16.
Examples
These examples illustrate how to use the Runtime Speech Recognizer plugin with both streaming and non-streaming audio input, using the Runtime Audio Importer to obtain audio data as an example. Please note that separate downloading of the RuntimeAudioImporter is required to access the same set of audio importing features showcased in the examples (e.g. capturable sound wave and ImportAudioFromFile). These examples are solely intended to illustrate the core concept and do not include error handling.
Streaming audio input
This example captures audio data from the microphone as a stream using the Capturable sound wave and passes it to the speech recognizer. Copyable nodes.
Note: In UE 5.3 and potentially in other versions, you might encounter a situation where certain nodes you've copied are missing in your Blueprints. This can happen due to differences in how nodes are serialized/deserialized between different engine versions. To make sure everything functions correctly, double-check that all nodes are properly connected.
More extensive example
Non-streaming audio input
This example imports audio data to the Imported sound wave and recognizes the full audio data once it has been imported. Copyable nodes.