
로그는
2026-04-20 16:59:32
[DEBUG]
Received request: POST to /api/v1/chat with body
{
"model": "google/gemma-4-26b-a4b",
"input": "Also, you can change text and battle animation speeds in the menu, if you want things faster or slower than the default.",
"stream": true,
"store": false,
"system_prompt": "Translate the user's text into Korean. Preserve ev... <Truncated in logs> ... the right place. Return only the translated text.",
"temperature": 1,
"top_p": 0.95,
"top_k": 64,
"min_p": 0.05,
"repeat_penalty": 1.1,
"max_output_tokens": 256
}
2026-04-20 16:59:32
[INFO]
[google/gemma-4-26b-a4b]
Running api/v1/chat on history with 2 messages.
2026-04-20 16:59:32
[INFO]
[google/gemma-4-26b-a4b]
Streaming response...
2026-04-20 16:59:32
[DEBUG]
LlamaV4::predict slot selection: session_id=<empty> server-selected (LCP/LRU)
2026-04-20 16:59:32
[DEBUG]
slot get_availabl: id 3 | task -1 | selected slot by LRU, t_last = -1
srv get_availabl: updating prompt cache
srv load: - looking for better prompt, base f_keep = -1.000, sim = 0.000
srv update: - cache state: 0 prompts, 0.000 MiB (limits: 8192.000 MiB, 103168 tokens, 8589934592 est)
srv get_availabl: prompt cache update took 0.02 ms
slot launch_slot_: id 3 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 3 | task 0 | processing task, is_child = 0
slot update_slots: id 3 | task 0 | new prompt, n_ctx_slot = 103168, n_keep = 67, task.n_tokens = 67
slot update_slots: id 3 | task 0 | cache reuse is not supported - ignoring n_cache_reuse = 256
2026-04-20 16:59:32
[DEBUG]
slot update_slots: id 3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 3 | task 0 | prompt processing progress, n_tokens = 63, batch.n_tokens = 63, progress = 0.940298
2026-04-20 16:59:32
[INFO]
[google/gemma-4-26b-a4b]
Prompt processing progress: 0.00%
2026-04-20 16:59:33
[DEBUG]
slot update_slots: id 3 | task 0 | n_tokens = 63, memory_seq_rm [63, end)
2026-04-20 16:59:33
[DEBUG]
slot init_sampler: id 3 | task 0 | init sampler, took 0.01 ms, tokens: text = 67, total = 67
slot update_slots: id 3 | task 0 | prompt processing done, n_tokens = 67, batch.n_tokens = 4
2026-04-20 16:59:33
[INFO]
[google/gemma-4-26b-a4b]
Prompt processing progress: 94.03%
2026-04-20 16:59:33
[DEBUG]
slot update_slots: id 3 | task 0 | created context checkpoint 1 of 32 (pos_min = 0, pos_max = 62, n_tokens = 63, size = 12.306 MiB)
2026-04-20 16:59:33
[INFO]
[google/gemma-4-26b-a4b]
Prompt processing progress: 100.00%
2026-04-20 16:59:48
[DEBUG]
slot print_timing: id 3 | task 0 |
prompt eval time = 1067.35 ms / 67 tokens ( 15.93 ms per token, 62.77 tokens per second)
eval time = 14666.51 ms / 256 tokens ( 57.29 ms per token, 17.45 tokens per second)
total time = 15733.86 ms / 323 tokens
slot release: id 3 | task 0 | stop processing: n_tokens = 322, truncated = 0
srv update_slots: all slots are idle
2026-04-20 16:59:48
[INFO]
[google/gemma-4-26b-a4b]
Finished streaming response
2026-04-20 16:59:48
[DEBUG]
LlamaV4: server assigned slot 3 to task 0
이렇게 나왔고, 문제 없이 결과가 나와야 하는데 막상 적용이 안 되는 이슈가 있는 것 같습니다

설정은 이렇습니다
보시면 알겠지만 마을에 텍스트가 번역이 되기는 되고 있습니다


약간의 겹칩은 있긴 한데 이런건 무시할 만 합니다
그런데 텍스트 번역이 되지 않는 건 좀 이상하네요
textscale이 있는 걸로 봐서 개발자분께서 업데이트 한 버전을 사용했습니다