Skip to main content

Plugin Configuration

Model Configuration

Standard Model Configuration

The Create Runtime Viseme Generator node uses default settings that work well for most scenarios. Configuration is handled through the Animation Blueprint blending node properties.

For Animation Blueprint configuration options, see the Lip Sync Configuration section below.

Realistic Model Configuration

The Create Realistic MetaHuman Lip Sync Generator node accepts an optional Configuration parameter that allows you to customize the generator's behavior:

Model Type

The Model Type setting determines which version of the realistic model to use:

Model TypePerformanceVisual QualityNoise HandlingRecommended Use Cases
Highly Optimized (Default)Highest performance, lowest CPU usageGood qualityMay show noticeable mouth movements with background noise or non-voice soundsClean audio environments, performance-critical scenarios
Semi-OptimizedGood performance, moderate CPU usageHigh qualityBetter stability with noisy audioBalanced performance and quality, mixed audio conditions
OriginalSuitable for real-time use on modern CPUsHighest qualityMost stable with background noise and non-voice soundsHigh-quality productions, noisy audio environments, when maximum accuracy is needed

Performance Settings

Intra Op Threads: Controls the number of threads used for internal model processing operations.

  • 0 (Default/Automatic): Uses automatic detection (typically 1/4 of available CPU cores, maximum 4)
  • 1-16: Manually specify thread count. Higher values may improve performance on multi-core systems but use more CPU

Inter Op Threads: Controls the number of threads used for parallel execution of different model operations.

  • 0 (Default/Automatic): Uses automatic detection (typically 1/8 of available CPU cores, maximum 2)
  • 1-8: Manually specify thread count. Usually kept low for real-time processing

Processing Chunk Size

The Processing Chunk Size determines how many samples are processed in each inference step. The default value is 160 samples (10ms of audio at 16kHz):

  • Smaller values provide more frequent updates but increase CPU usage
  • Larger values reduce CPU load but may decrease lip sync responsiveness
  • Recommended to use multiples of 160 for optimal alignment

Setting Processing Chunk Size

Mood-Enabled Model Configuration

The Create Realistic MetaHuman Lip Sync With Mood Generator node provides additional configuration options beyond the basic realistic model:

Basic Configuration

Lookahead Ms: Lookahead timing in milliseconds for improved lip sync accuracy.

  • Default: 80ms
  • Range: 20ms to 200ms (must be divisible by 20)
  • Higher values provide better synchronization but increase latency

Output Type: Controls which facial controls are generated.

  • Full Face: All 81 facial controls (eyebrows, eyes, nose, mouth, jaw, tongue)
  • Mouth Only: Only mouth, jaw, and tongue-related controls

Performance Settings: Uses the same Intra Op Threads and Inter Op Threads settings as the regular realistic model.

Mood Settings

Available Moods:

  • Neutral, Happy, Sad, Disgust, Anger, Surprise, Fear
  • Confident, Excited, Bored, Playful, Confused

Mood Intensity: Controls how strongly the mood affects the animation (0.0 to 1.0)

Runtime Mood Control

You can adjust mood settings during runtime using the following functions:

  • Set Mood: Change the current mood type
  • Set Mood Intensity: Adjust how strongly the mood affects the animation (0.0 to 1.0)
  • Set Lookahead Ms: Modify the lookahead timing for synchronization
  • Set Output Type: Switch between Full Face and Mouth Only controls

Mood Configuration

Mood Selection Guide

Choose appropriate moods based on your content:

MoodBest ForTypical Intensity Range
NeutralGeneral conversation, narration, default state0.5 - 1.0
HappyPositive content, cheerful dialogue, celebrations0.6 - 1.0
SadMelancholic content, emotional scenes, somber moments0.5 - 0.9
DisgustNegative reactions, distasteful content, rejection0.4 - 0.8
AngerAggressive dialogue, confrontational scenes, frustration0.6 - 1.0
SurpriseUnexpected events, revelations, shock reactions0.7 - 1.0
FearThreatening situations, anxiety, nervous dialogue0.5 - 0.9
ConfidentProfessional presentations, leadership dialogue, assertive speech0.7 - 1.0
ExcitedEnergetic content, announcements, enthusiastic dialogue0.8 - 1.0
BoredMonotonous content, disinterested dialogue, tired speech0.3 - 0.7
PlayfulCasual conversation, humor, light-hearted interactions0.6 - 0.9
ConfusedQuestion-heavy dialogue, uncertainty, bewilderment0.4 - 0.8

Animation Blueprint Configuration

Lip Sync Configuration

The Blend Runtime MetaHuman Lip Sync node has configuration options in its properties panel:

PropertyDefaultDescription
Interpolation Speed25Controls how quickly the lip movements transition between visemes. Higher values result in faster more abrupt transitions.
Reset Time0.2The duration in seconds after which the lip sync is reset. This is useful to prevent the lip sync from continuing after the audio has stopped.

Laughter Animation

You can also add laughter animations that will dynamically respond to laughter detected in the audio:

  1. Add the Blend Runtime MetaHuman Laughter node
  2. Connect your RuntimeVisemeGenerator variable to the Viseme Generator pin
  3. If you're already using lip sync:
    • Connect the output from the Blend Runtime MetaHuman Lip Sync node to the Source Pose of the Blend Runtime MetaHuman Laughter node
    • Connect the output of the Blend Runtime MetaHuman Laughter node to the Result pin of the Output Pose
  4. If using only laughter without lip sync:
    • Connect your source pose directly to the Source Pose of the Blend Runtime MetaHuman Laughter node
    • Connect the output to the Result pin

Blend Runtime MetaHuman Laughter

When laughter is detected in the audio, your character will dynamically animate accordingly:

Laughter

Laughter Configuration

The Blend Runtime MetaHuman Laughter node has its own configuration options:

PropertyDefaultDescription
Interpolation Speed25Controls how quickly the lip movements transition between laughter animations. Higher values result in faster more abrupt transitions.
Reset Time0.2The duration in seconds after which the laughter is reset. This is useful to prevent the laughter from continuing after the audio has stopped.
Max Laughter Weight0.7Scales the maximum intensity of the laughter animation (0.0 - 1.0).

Note: Laughter detection is currently available only with the Standard Model.

Combining with Existing Animations

To apply lip sync and laughter alongside existing body animations and custom facial animations without overriding them:

  1. Add a Layered blend per bone node between your body animations and the final output. Make sure Use Attached Parent is true.
  2. Configure the layer setup:
    • Add 1 item to the Layer Setup array
    • Add 3 items to the Branch Filters for the layer, with the following Bone Names:
      • FACIAL_C_FacialRoot
      • FACIAL_C_Neck2Root
      • FACIAL_C_Neck1Root
  3. Important for custom facial animations: In the Curve Blend Option, select "Use Max Value". This allows custom facial animations (expressions, emotions, etc.) to be properly layered on top of the lip sync.
  4. Make the connections:
    • Existing animations (such as BodyPose) → Base Pose input
    • Facial animation output (from lip sync and/or laughter nodes) → Blend Poses 0 input
    • Layered blend node → Final Result pose

Layered Blend Per Bone

Fine-Tuning Lip Sync Behavior

Tongue Protrusion Control

In the standard lip sync model, you may notice excessive forward tongue movement during certain phonemes. To control tongue protrusion:

  1. After your lip sync blend node, add a Modify Curve node
  2. Right-click on the Modify Curve node and select Add Curve Pin
  3. Add a curve pin with the name CTRL_expressions_tongueOut
  4. Set the node's Apply Mode property to Scale
  5. Adjust the Value parameter to control tongue extension (e.g., 0.8 to reduce protrusion by 20%)

Jaw Opening Control

The realistic lip sync may produce overly responsive jaw movements depending on your audio content and visual requirements. To adjust jaw opening intensity:

  1. After your lip sync blend node, add a Modify Curve node
  2. Right-click on the Modify Curve node and select Add Curve Pin
  3. Add a curve pin with the name CTRL_expressions_jawOpen
  4. Set the node's Apply Mode property to Scale
  5. Adjust the Value parameter to control jaw opening range (e.g., 0.9 to reduce jaw movement by 10%)

Mood-Specific Fine-Tuning

For mood-enabled models, you can fine-tune specific emotional expressions:

Eyebrow Control:

  • CTRL_L_brow_raiseIn.ty / CTRL_R_brow_raiseIn.ty - Inner eyebrow raise
  • CTRL_L_brow_raiseOut.ty / CTRL_R_brow_raiseOut.ty - Outer eyebrow raise
  • CTRL_L_brow_down.ty / CTRL_R_brow_down.ty - Eyebrow lowering

Eye Expression Control:

  • CTRL_L_eye_squintInner.ty / CTRL_R_eye_squintInner.ty - Eye squinting
  • CTRL_L_eye_cheekRaise.ty / CTRL_R_eye_cheekRaise.ty - Cheek raising

Model Comparison and Selection

Choosing Between Models

When deciding which lip sync model to use for your project, consider these factors:

ConsiderationStandard ModelRealistic ModelMood-Enabled Realistic Model
Character CompatibilityMetaHumans and all custom character typesMetaHumans onlyMetaHumans only
Visual QualityGood lip sync with efficient performanceEnhanced realism with more natural mouth movementsEnhanced realism with emotional expressions
PerformanceOptimized for all platforms including mobile/VRHigher resource requirementsHigher resource requirements
Features14 visemes, laughter detection81 facial controls, 3 optimization levels81 facial controls, 12 moods, configurable output
Platform SupportWindows, Android, QuestWindows, Mac, iOS, LinuxWindows, Mac, iOS, Linux
Use CasesGeneral applications, games, VR/AR, mobileCinematic experiences, close-up interactionsEmotional storytelling, advanced character interaction

Engine Version Compatibility

UE 5.2 Compatibility Issue

If you're using Unreal Engine 5.2, the Realistic Models may not work correctly due to a bug in UE's resampling library. For UE 5.2 users who need reliable lip sync functionality, please use the Standard Model instead.

This issue is specific to UE 5.2 and does not affect other engine versions.

Performance Recommendations

  • For most projects, the Standard Model provides an excellent balance of quality and performance
  • Use the Realistic Model when you need the highest visual fidelity for MetaHuman characters
  • Use the Mood-Enabled Realistic Model when emotional expression control is important for your application
  • Consider your target platform's performance capabilities when choosing between models
  • Test different optimization levels to find the best balance for your specific use case

TTS Compatibility

Model TypeLocal TTS Support (via Runtime Text To Speech)External TTS SupportNotes
Standard Model✅ Full support✅ Full supportCompatible with all TTS options
Realistic Model❌ Limited support✅ Full supportONNX runtime conflicts with local TTS
Mood-Enabled Realistic Model✅ Full support✅ Full supportCompatible with all TTS options

Troubleshooting

Common Issues

Generator Recreation for Realistic Models: For reliable and consistent operation with the Realistic Models, it's recommended to recreate the generator each time you want to feed new audio data after a period of inactivity. This is due to ONNX runtime behavior that can cause lip sync to stop working when reusing generators after periods of silence.

Local TTS Compatibility: Local TTS provided by Runtime Text To Speech plugin is not currently supported with the regular Realistic model due to ONNX runtime conflicts. However, it is fully compatible with both the Standard model and the Mood-Enabled Realistic model. Use external TTS services if you specifically need the regular Realistic model with TTS functionality.

Performance Optimization:

  • Adjust Processing Chunk Size for Realistic models based on your performance requirements
  • Use appropriate thread counts for your target hardware
  • Consider using Mouth Only output type for mood-enabled models when full facial animation isn't needed