Skip to main content

How to use the plugin

This guide covers the full runtime API: creating an LLM instance, loading models, sending messages, downloading models at runtime, managing state, and utility functions.

Create an LLM Instance

Start by creating a Runtime Local LLM object. Maintain a reference to it (e.g. as a variable in Blueprints or a UPROPERTY in C++) to prevent premature garbage collection.

Create Runtime Local LLM

Load a Model

You must load a model before sending messages. The plugin provides several loading methods depending on your workflow.

Load by Name

If you manage models through the editor settings panel, use Load Model (By Name).

In UE 5.4 and later, Load Model (By Name) presents a dropdown of all models on disk - simply select the model you want to load.

Load Model By Name UE 5.4+

Load from File Path

Load a model directly from an absolute file path to a .gguf file:

Load Model From File

Load from URL (Download and Load)

Download a model from a URL (if not already on disk) and load it automatically. If the file already exists locally, the download is skipped.

The simplest variant takes only a URL - metadata is derived from the filename:

Load Model From URL Simple

You can also use Load Model From URL with full model metadata for richer model information:

Load Model From URL

Async Load (Blueprint)

To handle load completion and errors via output pins instead of binding delegates manually, two async nodes are available.

Load Model By Name (Async) mirrors Load Model (By Name) - in UE 5.4+ it presents a dropdown of all models on disk:

Load Model By Name Async UE 5.4+

Load Model From File (Async) takes an absolute file path instead:

Load Model From File Async

Bind Events

Bind to the LLM instance's delegates to receive callbacks. All callbacks fire on the game thread.

Bind Events

Available delegates:

  • On Token Generated: Fires for each output token
  • On Generation Complete: Fires when the full response is ready, with duration, token count, and tokens-per-second
  • On Prompt Processed: Fires after the input prompt is processed, before generation begins
  • On Error: Fires if an error occurs during any operation
  • On Model Loaded: Fires when a model finishes loading
  • On Model Unloaded: Fires when the model is unloaded
  • On Download Progress: Fires periodically during a model download (progress fraction, bytes received, total bytes)
  • On Model Downloaded: Fires when a download-only operation completes
  • On Conversation Saved: Fires when a conversation has been written to a JSON file
  • On Conversation Loaded: Fires when a conversation has been loaded from a file or memory snapshot
  • On History Summarized: Fires when auto-summarization compresses older messages (reports message count, tokens saved, and the summary)

Send Messages

Once a model is loaded, send a user message to generate a response:

Send Message

To override the system prompt for a specific message, use Send Message With System Prompt:

Send Message With System Prompt

Tokens stream through OnTokenGenerated as they are produced. When generation finishes, OnGenerationComplete fires with the full response, duration, token count, and tokens-per-second.

Async Send Message (Blueprint)

The Send LLM Message (Async) node provides dedicated output pins for tokens, completion, and errors:

Async Send Message

Download Models at Runtime

Besides the download-and-load flow described above, you can download a model to disk without loading it. This is useful for pre-caching models in a loading screen or settings menu.

Download Model

A URL-only variant is also available:

Download Model From URL

The Download LLM Model (Async) and Download LLM Model From URL (Async) node provides output pins for progress, completion, and errors:

Async Download Model

The OnDownloadProgress delegate reports progress during the download. OnModelDownloaded fires when the file is saved to disk.

To cancel an in-progress download:

Cancel Download

The plugin prevents duplicate downloads automatically - if a download is already in progress for the same model, subsequent calls are ignored.

Stop Generation

To interrupt an ongoing generation:

Stop Generation

Reset Conversation Context

Clear the conversation history to start a new conversation:

Reset Context

Save and Load Conversations

The plugin can persist conversation history to disk as JSON or keep it in memory as a snapshot. By default, the system prompt is excluded from saves, so the same conversation history can be loaded into different LLM instances with different system rules. This is useful for multi-NPC scenarios, where each character has its own memory but may share or differ in their system instructions.

Save to File

Save the current conversation to a JSON file on disk:

Save Conversation To File

The Include System Prompt parameter controls whether the system message (if present) is written to the file. Default is false for portability between NPCs.

On Conversation Saved fires when the file is written.

Load from File

Load a conversation back from a JSON file:

Load Conversation From File

The Preserve Current System Prompt parameter (default true) keeps the currently loaded system prompt intact while swapping in the saved conversation history. This is the recommended setting for NPC memory swapping.

On Conversation Loaded fires with the loaded snapshot.

In-Memory Snapshots (Multi-NPC Workflow)

For fast NPC swapping during gameplay, snapshot the current conversation into memory rather than writing to disk. This pattern is the recommended way to manage many NPCs sharing a single loaded model:

The typical multi-NPC pattern uses a Map of Name → LLM Conversation Snapshot on your NPC manager or game state:

  1. When switching away from an NPC: call Save Conversation To Memory, then in On Conversation Loaded (which also fires for snapshot delivery), store the snapshot in your map keyed by NPC name.
  2. When switching to another NPC: read the snapshot from your map and call Load Conversation From Memory with Preserve Current System Prompt enabled.

Multi NPC Pattern

Since the system prompt stays loaded across swaps, each NPC's "personality" can either be encoded in a per-NPC system prompt (call Send Message With System Prompt once after a swap to update it) or shared across all NPCs.

tip

Snapshots are model-agnostic - they store messages, not KV cache state. The same snapshot can be loaded into a different model (though the conversational style may shift). The OriginModelFamilyName field on the snapshot lets you check which model produced it, if you want to enforce compatibility.

Automatic Context Summarization

Long conversations eventually exceed the model's context window, which would normally either truncate the history or cause errors. The plugin's auto-summarization feature monitors context usage and, when a configured threshold is exceeded, summarizes older messages into a single "memory" message before the next response is generated. This keeps token costs and latency stable across indefinitely long conversations.

The summarization is performed by the same loaded model, so no second model or API call is needed.

Enable Auto-Summarization

Enable Auto Summarization

Use Get Default Summarization Config for sensible starting defaults, then adjust as needed:

Get Default Summarization Config

Once enabled, summarization runs automatically before each SendMessage call when needed, no further action required.

Configuration Reference

ParameterTypeDefaultDescription
Trigger Token Thresholdint321500Summarization runs when used context tokens exceed this value. Set this relative to your Context Size, around 60-75% is a good rule of thumb
Keep Recent Message Countint324The most recent N messages are never summarized, preserving immediate conversational coherence
Min Messages To Summarizeint326Skip summarization if fewer than this many older messages are eligible (avoids pointless tiny summaries)
Max Summary Tokensint32256Maximum length of the generated summary in tokens
Preserve System PromptbooltrueAlways keep the system message (index 0) intact
Summarization InstructionFString(see default)The instruction sent to the model to produce the summary
Summary Message PrefixFString"[Long-term memory summary of earlier conversation]: "Prepended to the generated summary when it's inserted into the conversation as an assistant-role memory message

Manual Trigger and Listening for Summaries

You can trigger summarization manually at any point regardless of threshold:

Summarize Now

Bind to On History Summarized to be notified when a summarization pass completes. The event reports how many messages were removed, how many tokens were saved, and the generated summary text, useful for showing a subtle indicator in chat UI:

On History Summarized

Disable Auto-Summarization

Disable Auto Summarization

Disabling does not undo summaries already applied to the conversation.

note

Summarization takes a moment to run on the background thread (the model is generating the summary). Token-stream callbacks are suppressed during this internal generation so they won't appear in your chat UI. On History Summarized fires once the splice is complete, just before the user's actual message is processed.

Unload a Model

Free resources when a model is no longer needed:

Unload Model

Query State

Check the current state of the LLM instance:

Query State

  • Is Model Loaded: True if a model is ready for inference
  • Is Generating: True if generation is in progress
  • Is Busy: True if any operation (loading, generating, downloading) is active
  • Is Downloading: True if a model download is in progress
  • Get Loaded Model Metadata: Returns metadata of the current model
  • Get Applied Inference Params: Returns the parameters applied when loading

Model Library Functions

A set of static utility functions is provided for managing model files on disk. These are useful for building model selection UI or checking model availability at runtime.

Get Downloaded Model Names / Metadata

Get Downloaded Model Names

Get All Downloaded Model Metadata

Check If a Model Is on Disk

Is Model On Disk

Get Model File Path

Get Model File Path

Delete Model Files

Delete Model Files

Get Pre-defined and Available Models

Get Predefined Models

Get All Available Models

Build Metadata from a URL

Construct a model metadata from a raw URL (fields are derived from the filename):

Make Metadata From URL

Utility Functions

A set of helper functions is provided for formatting and error display.

Bytes to Readable String

Converts a byte count to a human-readable string (e.g. "4.07 GB"). Useful for displaying model sizes in UI.

Bytes to Readable String

Format Download Progress

Formats a download progress string like "1.23 GB / 4.07 GB (30.2%)". If the total size is unknown, returns just the received amount.

Format Download Progress

Get Error Description / Error Code String

Get LLM Error Description returns a human-readable text description for an error code. Get LLM Error Code String returns the enum value name as a string (useful for logging).

Get Error Description

Error Codes Reference

CodeValueDescription
Unknown0An unspecified error
ModelLoadFailed10The GGUF file failed to load (corrupt file, incompatible format, etc.)
ContextCreateFailed11Failed to create the inference context
ModelNotLoaded20Inference was attempted with no model loaded
ChatTemplateFailed21The model's chat template failed to apply
TokenizationFailed22The input text could not be tokenized
ContextOverflow23The prompt + context exceeds the configured context size
PromptDecodeFailed24The prompt tokens failed to decode
ContextTooFullToGenerate25Not enough context space remaining to generate output
GenerationDecodeFailed30A token failed to decode during generation
GenerationTruncated31Generation stopped because the max token limit was reached
LLMInstanceNull40The LLM instance is null or invalid
ModelNotFoundOnDisk41The model file does not exist at the expected path
ModelURLEmpty42A download was requested with an empty URL
ModelDownloadCancelled43The download was cancelled
ModelDownloadEmptyData44The download completed but the response body was empty
ModelDownloadSaveFailed45The download completed but the file could not be saved to disk