How to improve performance
-
Decrease Step Size
By default, the step size is 5000 ms (5 seconds), meaning the audio data is recognized every 5 seconds during capture. If you want to recognize the audio data more frequently, you can decrease the step size, such as to 500 ms (0.5 seconds).
-
Use a Smaller Language Model
You can consider using a smaller language model, such as
Tiny Quantized (Q5_1)
, to reduce the model size and improve performance. Instructions on how to select a language model can be found here. -
Alter CPU Instruction Sets
The underlying library used in the plugin is whisper.cpp, which uses CPU instruction sets to increase the performance of the recognition. Currently, the instruction sets are hard-coded in the code in the plugin and defined by approximation/probability of having them depending on various macros, due to UE limitations for passing the compiler flags. You can manually modify the SpeechRecognizerPrivate.h file to define the instruction sets that are supported by your target platform. Here is the list of currently used instruction sets by whisper.cpp, which you can define manually in the SpeechRecognizerPrivate.h file:
-
AVX and AVX2 Family:
__AVX__
__AVXVNNI__
__AVX2__
__AVX512F__
__AVX512VBMI__
__AVX512VNNI__
__AVX512BF16__
-
Floating-Point and SIMD Extensions:
__FMA__
__F16C__
__SSE3__
__SSSE3__
-
ARM Architecture Extensions:
__ARM_NEON
__ARM_FEATURE_SVE
__ARM_FEATURE_FMA
__ARM_FEATURE_FP16_VECTOR_ARITHMETIC
__ARM_FEATURE_MATMUL_INT8
-
POWER Architecture Extensions:
__POWER9_VECTOR__
-
-
Use Acceleration Libraries
whisper.cpp can accelerate the recognition process by using the following libraries: Core ML for Apple Silicon devices, OpenVINO on devices including x86 CPUs and Intel GPUs, Nvidia GPU Cuda on Windows or Linux, BLAS CPU support via OpenBLAS, BLAS CPU support via Intel MKL. Please note that these libraries are not included in the plugin by default and you need to install them manually, following the whisper.cpp instructions.