How to use the plugin
The documentation you are currently viewing is for a plugin that has not yet been released. Content may be incomplete or subject to change. Please check back once the plugin is officially available on the Fab marketplace.
This guide covers the full runtime API: creating an LLM instance, loading models, sending messages, downloading models at runtime, managing state, and utility functions.
Create an LLM Instance
Start by creating a Runtime Local LLM object. Maintain a reference to it (e.g. as a variable in Blueprints or a UPROPERTY in C++) to prevent premature garbage collection.
- Blueprint
- C++

UPROPERTY()
URuntimeLocalLLM* LLM;
LLM = URuntimeLocalLLM::CreateRuntimeLocalLLM();
Load a Model
You must load a model before sending messages. The plugin provides several loading methods depending on your workflow.
Load by Name
If you manage models through the editor settings panel, use Load Model (By Name).
- Blueprint
- C++
- UE 5.3 and earlier
- UE 5.4+
In UE 5.3 and earlier the dropdown does not appear, so you need to retrieve the available models manually. Use Get All Downloaded Model Metadata, get the element at index 0 (or whichever model you need), pass it to Get Model File Name to retrieve the name string, then pass that to Load Model (By Name).

In UE 5.4 and later, Load Model (By Name) presents a dropdown of all models on disk - simply select the model you want to load.

In C++, use GetAllDownloadedModelMetadata to retrieve available models and GetModelFileName to get the name to pass to LoadModelByName:
FLLMInferenceParams Params;
Params.MaxTokens = 512;
Params.Temperature = 0.7f;
Params.SystemPrompt = TEXT("You are a helpful assistant.");
TArray<FLLMModelMetadata> DownloadedModels = URuntimeLLMLibrary::GetAllDownloadedModelMetadata();
if (DownloadedModels.Num() > 0)
{
const FLLMModelMetadata& Model = DownloadedModels[0]; // Select the first available model
FString ModelFileName = URuntimeLLMLibrary::GetModelFileName(Model);
LLM->LoadModelByName(FName(*ModelFileName), Params);
}
Load from File Path
Load a model directly from an absolute file path to a .gguf file:
- Blueprint
- C++

FLLMInferenceParams Params;
LLM->LoadModelFromFile(TEXT("/path/to/model.gguf"), Params);
Load from URL (Download and Load)
Download a model from a URL (if not already on disk) and load it automatically. If the file already exists locally, the download is skipped.
- Blueprint
- C++
The simplest variant takes only a URL - metadata is derived from the filename:

You can also use Load Model From URL with full model metadata for richer model information:

FLLMInferenceParams Params;
// Simple: URL only - metadata is derived from the filename
LLM->LoadModelFromURLSimple(
TEXT("https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf"), Params);
// With full metadata
FLLMModelMetadata Metadata;
Metadata.ModelFamilyName = TEXT("Llama3_2_1B_Instruct");
Metadata.ModelDisplayName = TEXT("Llama 3.2 1B Instruct");
Metadata.Description = TEXT("Meta's Llama 3.2 1B parameter instruction-tuned model. Lightweight and fast, suitable for simple tasks.");
Metadata.ParameterCount = TEXT("1B");
Metadata.Variant.VariantName = TEXT("Q4_K_M");
Metadata.Variant.ModelURL = TEXT("https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf");
Metadata.Variant.ApproximateSizeBytes = 776LL * 1024 * 1024;
Metadata.Variant.QuantizationType = ELLMQuantizationType::Q4_K_M;
LLM->LoadModelFromURL(Metadata, Params);
Async Load (Blueprint)
To handle load completion and errors via output pins instead of binding delegates manually, two async nodes are available.
Load Model By Name (Async) mirrors Load Model (By Name) - in UE 5.4+ it presents a dropdown of all models on disk:
- UE 5.4+
- UE 5.3 and earlier

In UE 5.3 and earlier the dropdown does not appear. Use Get All Downloaded Model Metadata, get the element at index 0 (or whichever model you need), pass it to Get Model File Name, then pass that to Load Model By Name (Async).

Load Model From File (Async) takes an absolute file path instead:

Bind Events
Bind to the LLM instance's delegates to receive callbacks. All callbacks fire on the game thread.
- Blueprint
- C++

Available delegates:
- On Token Generated: Fires for each output token
- On Generation Complete: Fires when the full response is ready, with duration, token count, and tokens-per-second
- On Prompt Processed: Fires after the input prompt is processed, before generation begins
- On Error: Fires if an error occurs during any operation
- On Model Loaded: Fires when a model finishes loading
- On Model Unloaded: Fires when the model is unloaded
- On Download Progress: Fires periodically during a model download (progress fraction, bytes received, total bytes)
- On Model Downloaded: Fires when a download-only operation completes
LLM->OnTokenGeneratedNative.AddLambda([](const FString& Token)
{
});
LLM->OnGenerationCompleteNative.AddLambda([](const FString& FullResponse)
{
});
LLM->OnPromptProcessedNative.AddLambda([]()
{
});
LLM->OnErrorNative.AddLambda([](const FString& ErrorMessage)
{
});
LLM->OnModelLoadedNative.AddLambda([](const FString& ModelName)
{
});
LLM->OnModelUnloadedNative.AddLambda([](const FString& ModelName)
{
});
LLM->OnDownloadProgressNative.AddLambda([](const FString& ModelName, float Progress)
{
});
LLM->OnModelDownloadedNative.AddLambda([](const FString& ModelName)
{
});
Send Messages
Once a model is loaded, send a user message to generate a response:
- Blueprint
- C++

To override the system prompt for a specific message, use Send Message With System Prompt:

LLM->SendMessage(TEXT("Tell me a short story about a brave knight."));
// With a custom system prompt override
LLM->SendMessageWithSystemPrompt(
TEXT("Translate this to French: Hello world"),
TEXT("You are a professional translator.")
);
Tokens stream through OnTokenGenerated as they are produced. When generation finishes, OnGenerationComplete fires with the full response, duration, token count, and tokens-per-second.
Async Send Message (Blueprint)
The Send LLM Message (Async) node provides dedicated output pins for tokens, completion, and errors:

Download Models at Runtime
Besides the download-and-load flow described above, you can download a model to disk without loading it. This is useful for pre-caching models in a loading screen or settings menu.
- Blueprint
- C++

A URL-only variant is also available:

The Download LLM Model (Async) and Download LLM Model From URL (Async) node provides output pins for progress, completion, and errors:

// With full metadata
FLLMModelMetadata Metadata;
Metadata.ModelFamilyName = TEXT("Llama3_2_1B_Instruct");
Metadata.ModelDisplayName = TEXT("Llama 3.2 1B Instruct");
Metadata.Description = TEXT("Meta's Llama 3.2 1B parameter instruction-tuned model. Lightweight and fast, suitable for simple tasks.");
Metadata.ParameterCount = TEXT("1B");
Metadata.Variant.VariantName = TEXT("Q4_K_M");
Metadata.Variant.ModelURL = TEXT("https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf");
Metadata.Variant.ApproximateSizeBytes = 776LL * 1024 * 1024;
Metadata.Variant.QuantizationType = ELLMQuantizationType::Q4_K_M;
LLM->DownloadModel(Metadata);
// URL only
LLM->DownloadModelFromURL(
TEXT("https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf"));
The OnDownloadProgress delegate reports progress during the download. OnModelDownloaded fires when the file is saved to disk.
To cancel an in-progress download:
- Blueprint
- C++

LLM->CancelDownload();
The plugin prevents duplicate downloads automatically - if a download is already in progress for the same model, subsequent calls are ignored.
Stop Generation
To interrupt an ongoing generation:
- Blueprint
- C++

LLM->StopGeneration();
Reset Conversation Context
Clear the conversation history to start a new conversation:
- Blueprint
- C++

// Keep the system prompt
LLM->ResetContext(true);
// Clear everything including the system prompt
LLM->ResetContext(false);
Unload a Model
Free resources when a model is no longer needed:
- Blueprint
- C++

LLM->UnloadModel();
Query State
Check the current state of the LLM instance:
- Blueprint
- C++

- Is Model Loaded: True if a model is ready for inference
- Is Generating: True if generation is in progress
- Is Busy: True if any operation (loading, generating, downloading) is active
- Is Downloading: True if a model download is in progress
- Get Loaded Model Metadata: Returns metadata of the current model
- Get Applied Inference Params: Returns the parameters applied when loading
// Is Model Loaded - true if a model is ready for inference
if (LLM->IsModelLoaded())
{
FLLMModelMetadata Metadata = LLM->GetLoadedModelMetadata();
UE_LOG(LogTemp, Log, TEXT("Model: %s"), *Metadata.ModelDisplayName);
FLLMInferenceParams Params = LLM->GetAppliedInferenceParams();
UE_LOG(LogTemp, Log, TEXT("Context size: %d"), Params.ContextSize);
}
// Is Generating - true if token generation is currently active
if (LLM->IsGenerating())
{
UE_LOG(LogTemp, Log, TEXT("Generation in progress..."));
}
// Is Busy - true if any operation (loading, generating, downloading) is active
if (LLM->IsBusy())
{
UE_LOG(LogTemp, Log, TEXT("LLM is busy, deferring request"));
}
// Is Downloading - true if a model download is currently in progress
if (LLM->IsDownloading())
{
UE_LOG(LogTemp, Log, TEXT("Model download in progress..."));
}
// Safe to send a new message or load a different model
if (!LLM->IsGenerating() && !LLM->IsBusy())
{
UE_LOG(LogTemp, Log, TEXT("LLM is idle and ready"));
}
Model Library Functions
A set of static utility functions is provided for managing model files on disk. These are useful for building model selection UI or checking model availability at runtime.
Get Downloaded Model Names / Metadata
- Blueprint
- C++


TArray<FName> ModelNames = URuntimeLLMLibrary::GetDownloadedModelNames();
TArray<FLLMModelMetadata> AllModels = URuntimeLLMLibrary::GetAllDownloadedModelMetadata();
for (const FLLMModelMetadata& Model : AllModels)
{
UE_LOG(LogTemp, Log, TEXT("Model: %s (%s)"), *Model.ModelDisplayName, *Model.Variant.VariantName);
}
Check If a Model Is on Disk
- Blueprint
- C++

bool bExists = URuntimeLLMLibrary::IsModelOnDisk(Metadata);
Get Model File Path
- Blueprint
- C++

FString FilePath = URuntimeLLMLibrary::GetModelFilePath(Metadata);
Delete Model Files
- Blueprint
- C++

bool bDeleted = URuntimeLLMLibrary::DeleteModelFiles(Metadata);
Get Pre-defined and Available Models
- Blueprint
- C++


// Built-in catalog only
TArray<FLLMModelFamily> Predefined = URuntimeLLMLibrary::GetPredefinedModels();
// Catalog + custom imports
TArray<FLLMModelFamily> All = URuntimeLLMLibrary::GetAllAvailableModels();
Build Metadata from a URL
Construct a model metadata from a raw URL (fields are derived from the filename):
- Blueprint
- C++

FLLMModelMetadata Metadata = URuntimeLocalLLM::MakeMetadataFromURL(
TEXT("https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf")
);
Utility Functions
A set of helper functions is provided for formatting and error display.
Bytes to Readable String
Converts a byte count to a human-readable string (e.g. "4.07 GB"). Useful for displaying model sizes in UI.

Format Download Progress
Formats a download progress string like "1.23 GB / 4.07 GB (30.2%)". If the total size is unknown, returns just the received amount.

Get Error Description / Error Code String
Get LLM Error Description returns a human-readable text description for an error code. Get LLM Error Code String returns the enum value name as a string (useful for logging).

Error Codes Reference
| Code | Value | Description |
|---|---|---|
| Unknown | 0 | An unspecified error |
| ModelLoadFailed | 10 | The GGUF file failed to load (corrupt file, incompatible format, etc.) |
| ContextCreateFailed | 11 | Failed to create the inference context |
| ModelNotLoaded | 20 | Inference was attempted with no model loaded |
| ChatTemplateFailed | 21 | The model's chat template failed to apply |
| TokenizationFailed | 22 | The input text could not be tokenized |
| ContextOverflow | 23 | The prompt + context exceeds the configured context size |
| PromptDecodeFailed | 24 | The prompt tokens failed to decode |
| ContextTooFullToGenerate | 25 | Not enough context space remaining to generate output |
| GenerationDecodeFailed | 30 | A token failed to decode during generation |
| GenerationTruncated | 31 | Generation stopped because the max token limit was reached |
| LLMInstanceNull | 40 | The LLM instance is null or invalid |
| ModelNotFoundOnDisk | 41 | The model file does not exist at the expected path |
| ModelURLEmpty | 42 | A download was requested with an empty URL |
| ModelDownloadCancelled | 43 | The download was cancelled |
| ModelDownloadEmptyData | 44 | The download completed but the response body was empty |
| ModelDownloadSaveFailed | 45 | The download completed but the file could not be saved to disk |