Plugin Configuration
Model Configuration
Standard Model Configuration
The Create Runtime Viseme Generator
node uses default settings that work well for most scenarios. Configuration is handled through the Animation Blueprint blending node properties.
For Animation Blueprint configuration options, see the Lip Sync Configuration section below.
Realistic Model Configuration
The Create Realistic MetaHuman Lip Sync Generator
node accepts an optional Configuration parameter that allows you to customize the generator's behavior:
Model Type
The Model Type setting determines which version of the realistic model to use:
Model Type | Performance | Visual Quality | Noise Handling | Recommended Use Cases |
---|---|---|---|---|
Highly Optimized (Default) | Highest performance, lowest CPU usage | Good quality | May show noticeable mouth movements with background noise or non-voice sounds | Clean audio environments, performance-critical scenarios |
Semi-Optimized | Good performance, moderate CPU usage | High quality | Better stability with noisy audio | Balanced performance and quality, mixed audio conditions |
Original | Suitable for real-time use on modern CPUs | Highest quality | Most stable with background noise and non-voice sounds | High-quality productions, noisy audio environments, when maximum accuracy is needed |
Performance Settings
Intra Op Threads: Controls the number of threads used for internal model processing operations.
- 0 (Default/Automatic): Uses automatic detection (typically 1/4 of available CPU cores, maximum 4)
- 1-16: Manually specify thread count. Higher values may improve performance on multi-core systems but use more CPU
Inter Op Threads: Controls the number of threads used for parallel execution of different model operations.
- 0 (Default/Automatic): Uses automatic detection (typically 1/8 of available CPU cores, maximum 2)
- 1-8: Manually specify thread count. Usually kept low for real-time processing
Processing Chunk Size
The Processing Chunk Size determines how many samples are processed in each inference step. The default value is 160 samples (10ms of audio at 16kHz):
- Smaller values provide more frequent updates but increase CPU usage
- Larger values reduce CPU load but may decrease lip sync responsiveness
- Recommended to use multiples of 160 for optimal alignment
Mood-Enabled Model Configuration
The Create Realistic MetaHuman Lip Sync With Mood Generator
node provides additional configuration options beyond the basic realistic model:
Basic Configuration
Lookahead Ms: Lookahead timing in milliseconds for improved lip sync accuracy.
- Default: 80ms
- Range: 20ms to 200ms (must be divisible by 20)
- Higher values provide better synchronization but increase latency
Output Type: Controls which facial controls are generated.
- Full Face: All 81 facial controls (eyebrows, eyes, nose, mouth, jaw, tongue)
- Mouth Only: Only mouth, jaw, and tongue-related controls
Performance Settings: Uses the same Intra Op Threads and Inter Op Threads settings as the regular realistic model.
Mood Settings
Available Moods:
- Neutral, Happy, Sad, Disgust, Anger, Surprise, Fear
- Confident, Excited, Bored, Playful, Confused
Mood Intensity: Controls how strongly the mood affects the animation (0.0 to 1.0)
Runtime Mood Control
You can adjust mood settings during runtime using the following functions:
- Set Mood: Change the current mood type
- Set Mood Intensity: Adjust how strongly the mood affects the animation (0.0 to 1.0)
- Set Lookahead Ms: Modify the lookahead timing for synchronization
- Set Output Type: Switch between Full Face and Mouth Only controls
Mood Selection Guide
Choose appropriate moods based on your content:
Mood | Best For | Typical Intensity Range |
---|---|---|
Neutral | General conversation, narration, default state | 0.5 - 1.0 |
Happy | Positive content, cheerful dialogue, celebrations | 0.6 - 1.0 |
Sad | Melancholic content, emotional scenes, somber moments | 0.5 - 0.9 |
Disgust | Negative reactions, distasteful content, rejection | 0.4 - 0.8 |
Anger | Aggressive dialogue, confrontational scenes, frustration | 0.6 - 1.0 |
Surprise | Unexpected events, revelations, shock reactions | 0.7 - 1.0 |
Fear | Threatening situations, anxiety, nervous dialogue | 0.5 - 0.9 |
Confident | Professional presentations, leadership dialogue, assertive speech | 0.7 - 1.0 |
Excited | Energetic content, announcements, enthusiastic dialogue | 0.8 - 1.0 |
Bored | Monotonous content, disinterested dialogue, tired speech | 0.3 - 0.7 |
Playful | Casual conversation, humor, light-hearted interactions | 0.6 - 0.9 |
Confused | Question-heavy dialogue, uncertainty, bewilderment | 0.4 - 0.8 |
Animation Blueprint Configuration
Lip Sync Configuration
- Standard Model
- Realistic Models
The Blend Runtime MetaHuman Lip Sync
node has configuration options in its properties panel:
Property | Default | Description |
---|---|---|
Interpolation Speed | 25 | Controls how quickly the lip movements transition between visemes. Higher values result in faster more abrupt transitions. |
Reset Time | 0.2 | The duration in seconds after which the lip sync is reset. This is useful to prevent the lip sync from continuing after the audio has stopped. |
The Blend Realistic MetaHuman Lip Sync
node has configuration options in its properties panel:
Property | Default | Description |
---|---|---|
Interpolation Speed | 30 | Controls how quickly the lip movements transition between positions. Higher values result in faster more abrupt transitions. |
Reset Time | 0.2 | The duration in seconds after which the lip sync is reset. This is useful to prevent the lip sync from continuing after the audio has stopped. |
Note: The same Animation Blueprint node is used for both regular and mood-enabled realistic models.
Laughter Animation
You can also add laughter animations that will dynamically respond to laughter detected in the audio:
- Add the
Blend Runtime MetaHuman Laughter
node - Connect your
RuntimeVisemeGenerator
variable to theViseme Generator
pin - If you're already using lip sync:
- Connect the output from the
Blend Runtime MetaHuman Lip Sync
node to theSource Pose
of theBlend Runtime MetaHuman Laughter
node - Connect the output of the
Blend Runtime MetaHuman Laughter
node to theResult
pin of theOutput Pose
- Connect the output from the
- If using only laughter without lip sync:
- Connect your source pose directly to the
Source Pose
of theBlend Runtime MetaHuman Laughter
node - Connect the output to the
Result
pin
- Connect your source pose directly to the
When laughter is detected in the audio, your character will dynamically animate accordingly:
Laughter Configuration
The Blend Runtime MetaHuman Laughter
node has its own configuration options:
Property | Default | Description |
---|---|---|
Interpolation Speed | 25 | Controls how quickly the lip movements transition between laughter animations. Higher values result in faster more abrupt transitions. |
Reset Time | 0.2 | The duration in seconds after which the laughter is reset. This is useful to prevent the laughter from continuing after the audio has stopped. |
Max Laughter Weight | 0.7 | Scales the maximum intensity of the laughter animation (0.0 - 1.0). |
Note: Laughter detection is currently available only with the Standard Model.
Combining with Existing Animations
To apply lip sync and laughter alongside existing body animations and custom facial animations without overriding them:
- Add a
Layered blend per bone
node between your body animations and the final output. Make sureUse Attached Parent
is true. - Configure the layer setup:
- Add 1 item to the
Layer Setup
array - Add 3 items to the
Branch Filters
for the layer, with the followingBone Name
s:FACIAL_C_FacialRoot
FACIAL_C_Neck2Root
FACIAL_C_Neck1Root
- Add 1 item to the
- Important for custom facial animations: In the
Curve Blend Option
, select "Use Max Value". This allows custom facial animations (expressions, emotions, etc.) to be properly layered on top of the lip sync. - Make the connections:
- Existing animations (such as
BodyPose
) →Base Pose
input - Facial animation output (from lip sync and/or laughter nodes) →
Blend Poses 0
input - Layered blend node → Final
Result
pose
- Existing animations (such as
Fine-Tuning Lip Sync Behavior
Tongue Protrusion Control
In the standard lip sync model, you may notice excessive forward tongue movement during certain phonemes. To control tongue protrusion:
- After your lip sync blend node, add a
Modify Curve
node - Right-click on the
Modify Curve
node and select Add Curve Pin - Add a curve pin with the name
CTRL_expressions_tongueOut
- Set the node's Apply Mode property to Scale
- Adjust the Value parameter to control tongue extension (e.g., 0.8 to reduce protrusion by 20%)
Jaw Opening Control
The realistic lip sync may produce overly responsive jaw movements depending on your audio content and visual requirements. To adjust jaw opening intensity:
- After your lip sync blend node, add a
Modify Curve
node - Right-click on the
Modify Curve
node and select Add Curve Pin - Add a curve pin with the name
CTRL_expressions_jawOpen
- Set the node's Apply Mode property to Scale
- Adjust the Value parameter to control jaw opening range (e.g., 0.9 to reduce jaw movement by 10%)
Mood-Specific Fine-Tuning
For mood-enabled models, you can fine-tune specific emotional expressions:
Eyebrow Control:
CTRL_L_brow_raiseIn.ty
/CTRL_R_brow_raiseIn.ty
- Inner eyebrow raiseCTRL_L_brow_raiseOut.ty
/CTRL_R_brow_raiseOut.ty
- Outer eyebrow raiseCTRL_L_brow_down.ty
/CTRL_R_brow_down.ty
- Eyebrow lowering
Eye Expression Control:
CTRL_L_eye_squintInner.ty
/CTRL_R_eye_squintInner.ty
- Eye squintingCTRL_L_eye_cheekRaise.ty
/CTRL_R_eye_cheekRaise.ty
- Cheek raising
Model Comparison and Selection
Choosing Between Models
When deciding which lip sync model to use for your project, consider these factors:
Consideration | Standard Model | Realistic Model | Mood-Enabled Realistic Model |
---|---|---|---|
Character Compatibility | MetaHumans and all custom character types | MetaHumans only | MetaHumans only |
Visual Quality | Good lip sync with efficient performance | Enhanced realism with more natural mouth movements | Enhanced realism with emotional expressions |
Performance | Optimized for all platforms including mobile/VR | Higher resource requirements | Higher resource requirements |
Features | 14 visemes, laughter detection | 81 facial controls, 3 optimization levels | 81 facial controls, 12 moods, configurable output |
Platform Support | Windows, Android, Quest | Windows, Mac, iOS, Linux | Windows, Mac, iOS, Linux |
Use Cases | General applications, games, VR/AR, mobile | Cinematic experiences, close-up interactions | Emotional storytelling, advanced character interaction |
Engine Version Compatibility
If you're using Unreal Engine 5.2, the Realistic Models may not work correctly due to a bug in UE's resampling library. For UE 5.2 users who need reliable lip sync functionality, please use the Standard Model instead.
This issue is specific to UE 5.2 and does not affect other engine versions.
Performance Recommendations
- For most projects, the Standard Model provides an excellent balance of quality and performance
- Use the Realistic Model when you need the highest visual fidelity for MetaHuman characters
- Use the Mood-Enabled Realistic Model when emotional expression control is important for your application
- Consider your target platform's performance capabilities when choosing between models
- Test different optimization levels to find the best balance for your specific use case
TTS Compatibility
Model Type | Local TTS Support (via Runtime Text To Speech) | External TTS Support | Notes |
---|---|---|---|
Standard Model | ✅ Full support | ✅ Full support | Compatible with all TTS options |
Realistic Model | ❌ Limited support | ✅ Full support | ONNX runtime conflicts with local TTS |
Mood-Enabled Realistic Model | ✅ Full support | ✅ Full support | Compatible with all TTS options |
Troubleshooting
Common Issues
Generator Recreation for Realistic Models: For reliable and consistent operation with the Realistic Models, it's recommended to recreate the generator each time you want to feed new audio data after a period of inactivity. This is due to ONNX runtime behavior that can cause lip sync to stop working when reusing generators after periods of silence.
Local TTS Compatibility: Local TTS provided by Runtime Text To Speech plugin is not currently supported with the regular Realistic model due to ONNX runtime conflicts. However, it is fully compatible with both the Standard model and the Mood-Enabled Realistic model. Use external TTS services if you specifically need the regular Realistic model with TTS functionality.
Performance Optimization:
- Adjust Processing Chunk Size for Realistic models based on your performance requirements
- Use appropriate thread counts for your target hardware
- Consider using Mouth Only output type for mood-enabled models when full facial animation isn't needed