Skip to main content

Audio Processing Guide

This guide covers how to set up different audio input methods to feed audio data to your lip sync generators. Make sure you've completed the Setup Guide before proceeding.

Audio Input Processing

You need to set up a method to process audio input. There are several ways to do this depending on your audio source.

This approach performs lip sync in real-time while speaking into the microphone:

  1. Create a Capturable Sound Wave using Runtime Audio Importer
  2. Before starting to capture audio, bind to the OnPopulateAudioData delegate
  3. In the bound function, call ProcessAudioData from your Runtime Viseme Generator
  4. Start capturing audio from the microphone

Copyable nodes.

Lip Sync During Audio Capture

Processing Performance Tips

  • Chunk Size: Increasing the ProcessingChunkSize configuration option (e.g. to 320, 480, or 640 samples) can noticeably improve latency with minimal impact on quality or responsiveness.

  • Model Type: When using Realistic models, switching to the Highly Optimized model type (selected by default) can improve performance. Note that the original model may produce slightly better quality, particularly with noisy audio.

  • Buffer Management: The mood-enabled model processes audio in 320-sample frames (20ms at 16kHz). Ensure your audio input timing aligns with this for optimal performance.

  • Generator Recreation: For reliable operation with Realistic models, recreate the generator each time you want to feed new audio data after a period of inactivity.

Next Steps

Once you have audio processing set up, you may want to:

  • Learn about Configuration options to fine-tune your lip sync behavior
  • Add laughter animation for enhanced expressiveness
  • Combine lip sync with existing facial animations using the layering techniques described in the Configuration guide