플러그인 사용 방법

이 가이드는 전체 런타임 API를 다룹니다: LLM 인스턴스 생성, 모델 로딩, 메시지 전송, 런타임에서 모델 다운로드, 상태 관리, 그리고 유틸리티 함수.

LLM 인스턴스 생성

Runtime Local LLM 오브젝트를 생성하는 것으로 시작합니다. Blueprints에서 변수로 또는 C++에서 UPROPERTY로 참조를 유지하여 조기 가비지 컬렉션을 방지하세요.

Blueprint
C++

Runtime Local LLM 생성

    UPROPERTY()
    URuntimeLocalLLM* LLM;

    LLM = URuntimeLocalLLM::CreateRuntimeLocalLLM();

모델 로드

메시지를 보내기 전에 모델을 로드해야 합니다. 플러그인은 워크플로에 따라 여러 가지 로딩 방법을 제공합니다.

이름으로 로드

에디터 설정 패널을 통해 모델을 관리하는 경우 Load Model (By Name)을 사용하세요.

Blueprint
C++

UE 5.3 and earlier
UE 5.4+

UE 5.3 및 이전 버전에서는 드롭다운이 나타나지 않으므로 사용 가능한 모델을 수동으로 가져와야 합니다. Get All Downloaded Model Metadata를 사용하고, 인덱스 0(또는 필요한 모델)의 요소를 가져와 Get Model File Name에 전달하여 이름 문자열을 얻은 다음, 이를 Load Model (By Name)에 전달합니다.

Load Model By Name UE 5.3

UE 5.4 이상에서는 Load Model (By Name)이 디스크의 모든 모델 드롭다운을 표시합니다. 로드하려는 모델을 선택하기만 하면 됩니다.

Load Model By Name UE 5.4+

C++에서는 GetAllDownloadedModelMetadata를 사용하여 사용 가능한 모델을 가져오고, GetModelFileName을 사용하여 LoadModelByName에 전달할 이름을 가져옵니다:

    FLLMInferenceParams Params;
    Params.MaxTokens = 512;
    Params.Temperature = 0.7f;
    Params.SystemPrompt = TEXT("You are a helpful assistant.");

    TArray<FLLMModelMetadata> DownloadedModels = URuntimeLLMLibrary::GetAllDownloadedModelMetadata();

    if (DownloadedModels.Num() > 0)
    {
        const FLLMModelMetadata& Model = DownloadedModels[0]; // Select the first available model
        FString ModelFileName = URuntimeLLMLibrary::GetModelFileName(Model);
        LLM->LoadModelByName(FName(*ModelFileName), Params);
    }

파일 경로에서 로드

.gguf 파일의 절대 파일 경로에서 모델을 직접 로드합니다:

Blueprint
C++

파일에서 모델 로드

    FLLMInferenceParams Params;
    LLM->LoadModelFromFile(TEXT("/path/to/model.gguf"), Params);

URL에서 로드 (다운로드 및 로드)

URL에서 모델을 다운로드하고(디스크에 아직 없는 경우) 자동으로 로드합니다. 로컬에 이미 파일이 있으면 다운로드를 건너뜁니다.

Blueprint
C++

가장 단순한 변형은 URL만 사용하며, 파일 이름으로부터 메타데이터를 유추합니다.

Load Model From URL Simple

더 풍부한 모델 정보를 위해 전체 모델 메타데이터와 함께 Load Model From URL을 사용할 수도 있습니다.

Load Model From URL

    FLLMInferenceParams Params;

    // Simple: URL only - metadata is derived from the filename
    LLM->LoadModelFromURLSimple(
        TEXT("https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf"), Params);

    // With full metadata
    FLLMModelMetadata Metadata;
    Metadata.ModelFamilyName = TEXT("Llama3_2_1B_Instruct");
    Metadata.ModelDisplayName = TEXT("Llama 3.2 1B Instruct");
    Metadata.Description = TEXT("Meta's Llama 3.2 1B parameter instruction-tuned model. Lightweight and fast, suitable for simple tasks.");
    Metadata.ParameterCount = TEXT("1B");
    Metadata.Variant.VariantName = TEXT("Q4_K_M");
    Metadata.Variant.ModelURL = TEXT("https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf");
    Metadata.Variant.ApproximateSizeBytes = 776LL * 1024 * 1024;
    Metadata.Variant.QuantizationType = ELLMQuantizationType::Q4_K_M;
    LLM->LoadModelFromURL(Metadata, Params);

비동기 로드 (블루프린트)

수동으로 델리게이트를 바인딩하는 대신 출력 핀을 통해 로드 완료 및 오류를 처리하기 위해 두 개의 비동기 노드를 사용할 수 있습니다.

Load Model By Name (Async)는 Load Model (By Name)을 미러링합니다 - UE 5.4+에서는 디스크의 모든 모델 드롭다운을 제공합니다:

UE 5.4+
UE 5.3 and earlier

비동기로 모델 이름 로드 UE 5.4+

UE 5.3 및 그 이전 버전에서는 드롭다운이 나타나지 않습니다. Get All Downloaded Model Metadata를 사용하여 인덱스 0의 요소(또는 필요한 모델)를 가져온 후 Get Model File Name에 전달하고, 그 결과를 Load Model By Name (Async)에 전달하세요.

비동기로 모델 이름 로드 UE 5.3

Load Model From File (Async)는 대신 절대 파일 경로를 사용합니다:

비동기 파일에서 모델 로드

이벤트 바인딩

LLM 인스턴스의 델리게이트에 바인딩하여 콜백을 수신합니다. 모든 콜백은 게임 스레드에서 발생합니다.

Blueprint
C++

이벤트 바인딩

사용 가능한 델리게이트:

On Token Generated: 각 출력 토큰마다 발생합니다
On Generation Complete: 전체 응답이 준비되면 발생하며, 지속 시간, 토큰 수, 초당 토큰 수를 포함합니다
On Prompt Processed: 입력 프롬프트가 처리된 후 생성이 시작되기 전에 발생합니다
On Error: 작업 중 오류가 발생하면 발생합니다
On Model Loaded: 모델 로딩이 완료되면 발생합니다
On Model Unloaded: 모델이 언로드되면 발생합니다
On Download Progress: 모델 다운로드 중 주기적으로 발생합니다 (진행률 비율, 수신 바이트, 총 바이트)
On Model Downloaded: 다운로드 전용 작업이 완료되면 발생합니다

LLM->OnTokenGeneratedNative.AddLambda([](const FString& Token)
{
});

LLM->OnGenerationCompleteNative.AddLambda([](const FString& FullResponse)
{
});

LLM->OnPromptProcessedNative.AddLambda([]()
{
});

LLM->OnErrorNative.AddLambda([](const FString& ErrorMessage)
{
});

LLM->OnModelLoadedNative.AddLambda([](const FString& ModelName)
{
});

LLM->OnModelUnloadedNative.AddLambda([](const FString& ModelName)
{
});

LLM->OnDownloadProgressNative.AddLambda([](const FString& ModelName, float Progress)
{
});

LLM->OnModelDownloadedNative.AddLambda([](const FString& ModelName)
{
});

메시지 보내기

모델이 로드되면 사용자 메시지를 보내 응답을 생성합니다:

Blueprint
C++

Send Message

특정 메시지에 대한 시스템 프롬프트를 재정의하려면 Send Message With System Prompt를 사용하세요:

Send Message With System Prompt

    LLM->SendMessage(TEXT("Tell me a short story about a brave knight."));

    // With a custom system prompt override
    LLM->SendMessageWithSystemPrompt(
        TEXT("Translate this to French: Hello world"),
        TEXT("You are a professional translator.")
    );

토큰은 생성되는 즉시 OnTokenGenerated를 통해 스트리밍됩니다. 생성이 완료되면 OnGenerationComplete가 발생하여 전체 응답, 지속 시간, 토큰 수, 초당 토큰 수를 제공합니다.

비동기 메시지 전송 (Blueprint)

Send LLM Message (Async) 노드는 토큰, 완료 및 오류에 대한 전용 출력 핀을 제공합니다:

비동기 메시지 전송

런타임에 모델 다운로드

위에서 설명한 다운로드 및 로드 흐름 외에도, 모델을 로드하지 않고 디스크에 다운로드할 수 있습니다. 이는 로딩 화면이나 설정 메뉴에서 모델을 미리 캐싱하는 데 유용합니다.

Blueprint
C++

모델 다운로드

URL 전용 변형도 사용 가능합니다:

URL에서 모델 다운로드

Download LLM Model (Async) 및 Download LLM Model From URL (Async) 노드는 진행 상황, 완료 및 오류에 대한 출력 핀을 제공합니다:

비동기 모델 다운로드

    // With full metadata
    FLLMModelMetadata Metadata;
    Metadata.ModelFamilyName = TEXT("Llama3_2_1B_Instruct");
    Metadata.ModelDisplayName = TEXT("Llama 3.2 1B Instruct");
    Metadata.Description = TEXT("Meta's Llama 3.2 1B parameter instruction-tuned model. Lightweight and fast, suitable for simple tasks.");
    Metadata.ParameterCount = TEXT("1B");
    Metadata.Variant.VariantName = TEXT("Q4_K_M");
    Metadata.Variant.ModelURL = TEXT("https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf");
    Metadata.Variant.ApproximateSizeBytes = 776LL * 1024 * 1024;
    Metadata.Variant.QuantizationType = ELLMQuantizationType::Q4_K_M;
    LLM->DownloadModel(Metadata);

    // URL only
    LLM->DownloadModelFromURL(
        TEXT("https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf"));

OnDownloadProgress 델리게이트는 다운로드 진행 상황을 보고합니다. OnModelDownloaded는 파일이 디스크에 저장되면 실행됩니다.

진행 중인 다운로드를 취소하려면:

Blueprint
C++

다운로드 취소

    LLM->CancelDownload();

플러그인은 중복 다운로드를 자동으로 방지합니다 - 동일한 모델에 대해 다운로드가 이미 진행 중인 경우, 후속 호출은 무시됩니다.

생성 중지

진행 중인 생성을 중단하려면:

Blueprint
C++

생성 중지

    LLM->StopGeneration();

대화 컨텍스트 초기화

새 대화를 시작하려면 대화 기록을 지우세요:

Blueprint
C++

컨텍스트 초기화

    // Keep the system prompt
    LLM->ResetContext(true);

    // Clear everything including the system prompt
    LLM->ResetContext(false);

모델 언로드

모델이 더 이상 필요하지 않을 때 리소스를 해제합니다:

Blueprint
C++

모델 언로드

    LLM->UnloadModel();

쿼리 상태

LLM 인스턴스의 현재 상태를 확인합니다:

Blueprint
C++

쿼리 상태

Is Model Loaded: 모델이 추론할 준비가 되면 True
Is Generating: 생성이 진행 중이면 True
Is Busy: 어떤 작업(로딩, 생성, 다운로드)이든 활성 상태이면 True
Is Downloading: 모델 다운로드가 진행 중이면 True
Get Loaded Model Metadata: 현재 모델의 메타데이터를 반환합니다
Get Applied Inference Params: 로딩 시 적용된 매개변수를 반환합니다

  // Is Model Loaded - true if a model is ready for inference
  if (LLM->IsModelLoaded())
  {
      FLLMModelMetadata Metadata = LLM->GetLoadedModelMetadata();
      UE_LOG(LogTemp, Log, TEXT("Model: %s"), *Metadata.ModelDisplayName);
  
      FLLMInferenceParams Params = LLM->GetAppliedInferenceParams();
      UE_LOG(LogTemp, Log, TEXT("Context size: %d"), Params.ContextSize);
  }
  
  // Is Generating - true if token generation is currently active
  if (LLM->IsGenerating())
  {
      UE_LOG(LogTemp, Log, TEXT("Generation in progress..."));
  }
  
  // Is Busy - true if any operation (loading, generating, downloading) is active
  if (LLM->IsBusy())
  {
      UE_LOG(LogTemp, Log, TEXT("LLM is busy, deferring request"));
  }
  
  // Is Downloading - true if a model download is currently in progress
  if (LLM->IsDownloading())
  {
      UE_LOG(LogTemp, Log, TEXT("Model download in progress..."));
  }
  
  // Safe to send a new message or load a different model
  if (!LLM->IsGenerating() && !LLM->IsBusy())
  {
      UE_LOG(LogTemp, Log, TEXT("LLM is idle and ready"));
  }

모델 라이브러리 함수

디스크에 있는 모델 파일을 관리하기 위한 정적 유틸리티 함수 세트가 제공됩니다. 이는 모델 선택 UI를 구축하거나 런타임에 모델 가용성을 확인하는 데 유용합니다.

다운로드된 모델 이름 / 메타데이터 가져오기

Blueprint
C++

다운로드된 모델 이름 가져오기

다운로드된 모든 모델 메타데이터 가져오기

    TArray<FName> ModelNames = URuntimeLLMLibrary::GetDownloadedModelNames();

    TArray<FLLMModelMetadata> AllModels = URuntimeLLMLibrary::GetAllDownloadedModelMetadata();
    for (const FLLMModelMetadata& Model : AllModels)
    {
        UE_LOG(LogTemp, Log, TEXT("Model: %s (%s)"), *Model.ModelDisplayName, *Model.Variant.VariantName);
    }

모델이 디스크에 있는지 확인

Blueprint
C++

모델이 디스크에 있는지 확인

    bool bExists = URuntimeLLMLibrary::IsModelOnDisk(Metadata);

모델 파일 경로 가져오기

Blueprint
C++

모델 파일 경로 가져오기

```cpp
    FString FilePath = URuntimeLLMLibrary::GetModelFilePath(Metadata);

모델 파일 삭제

Blueprint
C++

모델 파일 삭제

    bool bDeleted = URuntimeLLMLibrary::DeleteModelFiles(Metadata);

사전 정의된 모델 및 사용 가능한 모델 가져오기

Blueprint
C++

사전 정의된 모델 가져오기

사용 가능한 모든 모델 가져오기

    // Built-in catalog only
    TArray<FLLMModelFamily> Predefined = URuntimeLLMLibrary::GetPredefinedModels();

    // Catalog + custom imports
    TArray<FLLMModelFamily> All = URuntimeLLMLibrary::GetAllAvailableModels();

URL에서 메타데이터 빌드

원시 URL로부터 모델 메타데이터를 구성합니다 (필드는 파일 이름에서 파생됩니다):

Blueprint
C++

Make Metadata From URL

    FLLMModelMetadata Metadata = URuntimeLocalLLM::MakeMetadataFromURL(
        TEXT("https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf")
    );

유틸리티 함수

포맷팅 및 오류 표시를 위한 도우미 함수 세트가 제공됩니다.

Bytes to Readable String

바이트 수를 사람이 읽을 수 있는 문자열(예: "4.07 GB")로 변환합니다. UI에서 모델 크기를 표시하는 데 유용합니다.

Bytes to Readable String

Format Download Progress

"1.23 GB / 4.07 GB (30.2%)"와 같은 다운로드 진행률 문자열을 포맷팅합니다. 총 크기를 알 수 없는 경우 수신된 양만 반환합니다.

Format Download Progress

오류 설명 가져오기 / 오류 코드 문자열

Get LLM Error Description는 오류 코드에 대한 사람이 읽을 수 있는 텍스트 설명을 반환합니다. Get LLM Error Code String은 enum 값 이름을 문자열로 반환합니다(로깅에 유용).

Get Error Description

오류 코드 참조

코드	값	설명
Unknown	0	지정되지 않은 오류
ModelLoadFailed	10	GGUF 파일을 로드하지 못했습니다(손상된 파일, 호환되지 않는 형식 등)
ContextCreateFailed	11	추론 컨텍스트를 생성하지 못했습니다
ModelNotLoaded	20	모델이 로드되지 않은 상태에서 추론을 시도했습니다
ChatTemplateFailed	21	모델의 채팅 템플릿을 적용하지 못했습니다
TokenizationFailed	22	입력 텍스트를 토큰화할 수 없습니다
ContextOverflow	23	프롬프트 + 컨텍스트가 구성된 컨텍스트 크기를 초과합니다
PromptDecodeFailed	24	프롬프트 토큰을 디코딩하지 못했습니다
ContextTooFullToGenerate	25	출력을 생성하기에 충분한 컨텍스트 공간이 남아 있지 않습니다
GenerationDecodeFailed	30	생성 중에 토큰을 디코딩하지 못했습니다
GenerationTruncated	31	최대 토큰 제한에 도달하여 생성이 중지되었습니다
LLMInstanceNull	40	LLM 인스턴스가 null이거나 유효하지 않습니다
ModelNotFoundOnDisk	41	모델 파일이 예상된 경로에 존재하지 않습니다
ModelURLEmpty	42	빈 URL로 다운로드를 요청했습니다
ModelDownloadCancelled	43	다운로드가 취소되었습니다
ModelDownloadEmptyData	44	다운로드가 완료되었지만 응답 본문이 비어 있습니다
ModelDownloadSaveFailed	45	다운로드가 완료되었지만 파일을 디스크에 저장할 수 없습니다

LLM 인스턴스 생성​

모델 로드​

이름으로 로드​

파일 경로에서 로드​

URL에서 로드 (다운로드 및 로드)​

비동기 로드 (블루프린트)​

이벤트 바인딩​

메시지 보내기​

비동기 메시지 전송 (Blueprint)​

런타임에 모델 다운로드​

생성 중지​

대화 컨텍스트 초기화​

모델 언로드​

쿼리 상태​

모델 라이브러리 함수​

다운로드된 모델 이름 / 메타데이터 가져오기​

모델이 디스크에 있는지 확인​

모델 파일 경로 가져오기​

모델 파일 삭제​

사전 정의된 모델 및 사용 가능한 모델 가져오기​

URL에서 메타데이터 빌드​

유틸리티 함수​

Bytes to Readable String​

Format Download Progress​

오류 설명 가져오기 / 오류 코드 문자열​

오류 코드 참조​

LLM 인스턴스 생성

모델 로드

이름으로 로드

파일 경로에서 로드

URL에서 로드 (다운로드 및 로드)

비동기 로드 (블루프린트)

이벤트 바인딩

메시지 보내기

비동기 메시지 전송 (Blueprint)

런타임에 모델 다운로드

생성 중지

대화 컨텍스트 초기화

모델 언로드

쿼리 상태

모델 라이브러리 함수

다운로드된 모델 이름 / 메타데이터 가져오기

모델이 디스크에 있는지 확인

모델 파일 경로 가져오기

모델 파일 삭제

사전 정의된 모델 및 사용 가능한 모델 가져오기

URL에서 메타데이터 빌드

유틸리티 함수

Bytes to Readable String

Format Download Progress

오류 설명 가져오기 / 오류 코드 문자열

오류 코드 참조