Skip to main content

How to improve performance

The plugin uses different GPU acceleration methods depending on the platform: Vulkan on Windows and Metal on Mac and iOS platforms, which significantly speed up the recognition process. On other platforms, the plugin uses the CPU + intrinsics for acceleration. However, you can further improve the performance of the plugin by following the recommendations below:

  1. Use Voice Activity Detection (VAD)

    It's highly recommended to use Voice Activity Detection to improve recognizer responsiveness by sending speech for recognition as soon as the user stops talking, rather than waiting for fixed time intervals. The Silero VAD is particularly recommended for this purpose. For detailed implementation instructions, see the Voice Activity Detection documentation.

  2. Decrease Step Size

    By default, the step size is 5000 ms (5 seconds), meaning the audio data is recognized every 5 seconds during capture. If you want to recognize the audio data more frequently, you can decrease the step size, such as to 500 ms (0.5 seconds). However, if VAD is active (which is typically recommended unless you have specific requirements that need fixed intervals), it's advised not to rely on decreasing the step size. When using VAD in typical setups like Voice Activated Command Recognition or Auto-Initializing Voice Recognition with Final Buffer Processing, the speech will be recognized as soon as the user stops talking anyway.

  3. Use a Smaller Language Model

    You can consider using a smaller language model, such as Tiny Quantized (Q5_1), to reduce the model size and improve performance. Instructions on how to select a language model can be found here.

  4. Optimize Recognition State Management

    When working with microphone input, avoid unnecessary stops and starts of the speech recognizer. Instead of calling StopSpeechRecognition and StartSpeechRecognition frequently, which requires resource reallocation, consider controlling the audio input directly. For example, with a capturable sound wave, use StopCapture and StartCapture to manage the audio flow while keeping the recognition thread active.