跳到主要内容

How to use the plugin

Unreleased Plugin

The documentation you are currently viewing is for a plugin that has not yet been released. Content may be incomplete or subject to change. Please check back once the plugin is officially available on the Fab marketplace.

This guide covers the full runtime API: creating an LLM instance, loading models, sending messages, downloading models at runtime, managing state, and utility functions.

Create an LLM Instance

Start by creating a Runtime Local LLM object. Maintain a reference to it (e.g. as a variable in Blueprints or a UPROPERTY in C++) to prevent premature garbage collection.

Create Runtime Local LLM

Load a Model

You must load a model before sending messages. The plugin provides several loading methods depending on your workflow.

Load by Name

If you manage models through the editor settings panel, use Load Model (By Name).

In UE 5.4 and later, Load Model (By Name) presents a dropdown of all models on disk - simply select the model you want to load.

Load Model By Name UE 5.4+

Load from File Path

Load a model directly from an absolute file path to a .gguf file:

Load Model From File

Load from URL (Download and Load)

Download a model from a URL (if not already on disk) and load it automatically. If the file already exists locally, the download is skipped.

The simplest variant takes only a URL - metadata is derived from the filename:

Load Model From URL Simple

You can also use Load Model From URL with full model metadata for richer model information:

Load Model From URL

Async Load (Blueprint)

To handle load completion and errors via output pins instead of binding delegates manually, two async nodes are available.

Load Model By Name (Async) mirrors Load Model (By Name) - in UE 5.4+ it presents a dropdown of all models on disk:

Load Model By Name Async UE 5.4+

Load Model From File (Async) takes an absolute file path instead:

Load Model From File Async

Bind Events

Bind to the LLM instance's delegates to receive callbacks. All callbacks fire on the game thread.

Bind Events

Available delegates:

  • On Token Generated: Fires for each output token
  • On Generation Complete: Fires when the full response is ready, with duration, token count, and tokens-per-second
  • On Prompt Processed: Fires after the input prompt is processed, before generation begins
  • On Error: Fires if an error occurs during any operation
  • On Model Loaded: Fires when a model finishes loading
  • On Model Unloaded: Fires when the model is unloaded
  • On Download Progress: Fires periodically during a model download (progress fraction, bytes received, total bytes)
  • On Model Downloaded: Fires when a download-only operation completes

Send Messages

Once a model is loaded, send a user message to generate a response:

Send Message

To override the system prompt for a specific message, use Send Message With System Prompt:

Send Message With System Prompt

Tokens stream through OnTokenGenerated as they are produced. When generation finishes, OnGenerationComplete fires with the full response, duration, token count, and tokens-per-second.

Async Send Message (Blueprint)

The Send LLM Message (Async) node provides dedicated output pins for tokens, completion, and errors:

Async Send Message

Download Models at Runtime

Besides the download-and-load flow described above, you can download a model to disk without loading it. This is useful for pre-caching models in a loading screen or settings menu.

Download Model

A URL-only variant is also available:

Download Model From URL

The Download LLM Model (Async) and Download LLM Model From URL (Async) node provides output pins for progress, completion, and errors:

Async Download Model

The OnDownloadProgress delegate reports progress during the download. OnModelDownloaded fires when the file is saved to disk.

To cancel an in-progress download:

Cancel Download

The plugin prevents duplicate downloads automatically - if a download is already in progress for the same model, subsequent calls are ignored.

Stop Generation

To interrupt an ongoing generation:

Stop Generation

Reset Conversation Context

Clear the conversation history to start a new conversation:

Reset Context

Unload a Model

Free resources when a model is no longer needed:

Unload Model

Query State

Check the current state of the LLM instance:

Query State

  • Is Model Loaded: True if a model is ready for inference
  • Is Generating: True if generation is in progress
  • Is Busy: True if any operation (loading, generating, downloading) is active
  • Is Downloading: True if a model download is in progress
  • Get Loaded Model Metadata: Returns metadata of the current model
  • Get Applied Inference Params: Returns the parameters applied when loading

Model Library Functions

A set of static utility functions is provided for managing model files on disk. These are useful for building model selection UI or checking model availability at runtime.

Get Downloaded Model Names / Metadata

Get Downloaded Model Names

Get All Downloaded Model Metadata

Check If a Model Is on Disk

Is Model On Disk

Get Model File Path

Get Model File Path

Delete Model Files

Delete Model Files

Get Pre-defined and Available Models

Get Predefined Models

Get All Available Models

Build Metadata from a URL

Construct a model metadata from a raw URL (fields are derived from the filename):

Make Metadata From URL

Utility Functions

A set of helper functions is provided for formatting and error display.

Bytes to Readable String

Converts a byte count to a human-readable string (e.g. "4.07 GB"). Useful for displaying model sizes in UI.

Bytes to Readable String

Format Download Progress

Formats a download progress string like "1.23 GB / 4.07 GB (30.2%)". If the total size is unknown, returns just the received amount.

Format Download Progress

Get Error Description / Error Code String

Get LLM Error Description returns a human-readable text description for an error code. Get LLM Error Code String returns the enum value name as a string (useful for logging).

Get Error Description

Error Codes Reference

CodeValueDescription
Unknown0An unspecified error
ModelLoadFailed10The GGUF file failed to load (corrupt file, incompatible format, etc.)
ContextCreateFailed11Failed to create the inference context
ModelNotLoaded20Inference was attempted with no model loaded
ChatTemplateFailed21The model's chat template failed to apply
TokenizationFailed22The input text could not be tokenized
ContextOverflow23The prompt + context exceeds the configured context size
PromptDecodeFailed24The prompt tokens failed to decode
ContextTooFullToGenerate25Not enough context space remaining to generate output
GenerationDecodeFailed30A token failed to decode during generation
GenerationTruncated31Generation stopped because the max token limit was reached
LLMInstanceNull40The LLM instance is null or invalid
ModelNotFoundOnDisk41The model file does not exist at the expected path
ModelURLEmpty42A download was requested with an empty URL
ModelDownloadCancelled43The download was cancelled
ModelDownloadEmptyData44The download completed but the response body was empty
ModelDownloadSaveFailed45The download completed but the file could not be saved to disk