Skip to main content

Audio Processing Guide

This guide covers how to set up different audio input methods to feed audio data to your lip sync generators. Make sure you've completed the Setup Guide before proceeding.

Audio Input Processing

You need to set up a method to process audio input. There are several ways to do this depending on your audio source.

This approach performs lip sync in real-time while speaking into the microphone:

  1. Create a Capturable Sound Wave using Runtime Audio Importer
  2. Before starting to capture audio, bind to the OnPopulateAudioData delegate
  3. In the bound function, call ProcessAudioData from your Runtime Viseme Generator
  4. Start capturing audio from the microphone

Copyable nodes.

Lip Sync During Audio Capture

Processing Performance Tips

  • Chunk Size: If you want to process audio data in smaller chunks for more responsive lip sync, adjust the calculation in the SetNumSamplesPerChunk function. For example, dividing the sample rate by 150 (streaming every ~6.67 ms) instead of 100 (streaming every 10 ms) will provide more frequent lip sync updates.

  • Buffer Management: The mood-enabled model processes audio in 320-sample frames (20ms at 16kHz). Ensure your audio input timing aligns with this for optimal performance.

  • Generator Recreation: For reliable operation with Realistic models, recreate the generator each time you want to feed new audio data after a period of inactivity.

Next Steps

Once you have audio processing set up, you may want to:

  • Learn about Configuration options to fine-tune your lip sync behavior
  • Add laughter animation for enhanced expressiveness
  • Combine lip sync with existing facial animations using the layering techniques described in the Configuration guide