Runtime Local LLM
The documentation you are currently viewing is for a plugin that has not yet been released. Content may be incomplete or subject to change. Please check back once the plugin is officially available on the Fab marketplace.
Documentation for the Runtime Local LLM plugin.
- Plugin Support & Custom Development: [email protected] (tailored solutions for teams & organizations)
Overview
Run large language models entirely on-device in Unreal Engine using llama.cpp. Offline inference with GGUF models, token streaming, and full Blueprint and C++ API support across Windows, Mac, Linux, Android, iOS, and Meta Quest.
Managing models in the editor
Browse, download, import, delete, and test LLM models directly in the Unreal Engine editor using the Runtime Local LLM plugin settings panel.
How to use the plugin
Complete runtime API reference for the Runtime Local LLM plugin covering LLM instance creation, model loading, message sending, downloading, state management, model library functions, and utilities.
Examples
Ready-to-use Blueprint and C++ examples for the Runtime Local LLM plugin including simple chat, download-and-chat, model pre-downloading, and NPC dialogue systems.
Inference parameters
Detailed reference for all LLM inference parameters including temperature, top-p, top-k, repeat penalty, GPU layer offloading, context size, seed, and thread count, with platform-specific recommendations for mobile, VR, and desktop.
Demo project
A ready-to-use demo project for the Runtime Local LLM plugin featuring a chat interface with streaming responses, model downloading via URL, and configurable inference parameters.