Local Translation Model Recommendations for 32GB VRAM GPUs

✍️ OpenClawRadar📅 Published: March 26, 2026🔗 Source
Local Translation Model Recommendations for 32GB VRAM GPUs
Ad

A developer with a 32GB VRAM GPU setup (specifically mentioning a 5090) shared practical findings on local translation models optimized for real-time subtitle and word/phrase translation. Their primary language pairs are Swedish-English and Korean-English.

Recommended Models

Based on testing for quality and speed:

  • For overall languages: Unsloth Gemma3 27b Instruct UD, Q6_K_XL
  • For European languages + 11 included (Korean among others): Bartowski Utter Project EuroLLM 22B Instruct 2512, Q8_0

The developer noted these outperformed previous go-to models: Magistral Small 2509 Q8, Gemma 3 27b Q4, Mistral Small 3.2 Q6_K, and GPT_OSS 20b (in that order).

Performance Notes

With these models, they achieved:

  • Subtitle translations with little to no buffering
  • Word-lookup translations within 0-2 seconds

Models That Were Too Slow

  • Qwen3.5 27b Q6
  • HyperCLOVAX SEED Think 32B Q6 (for Korean)
  • Qwen3 32b Q6 (among other Qwen3-3.5 variants)
  • Viking 33b I1 Q4_K_S
Ad

Other Observations

The developer mentioned TranslateGemma models, which they report are "significantly better according to Google than Gemma3 27b at translation," but noted these use user-user prompts rather than system-user format. They haven't tried them firsthand due to this format difference.

For Swedish translation specifically, GPT SW3 20b was noted as "good when it works, which is rarely (refuses to accept my system prompt)."

The developer also mentioned switching to trial Gemini 2.5 Flash and Gemini 2.5 Flash-lite not because local translations were bad, but because they were "still noticing some mistakes." They're debating between Deepseek, OpenAI, Gemini, z.AI, and Claude for cheap translations, with ChatGPT Thinking as their quality bar.

They noted some free API key options via: NVIDIA NIM, Routeway, Kilo, OpenCode, and Puter.js, though they haven't tried them. They did test GLM-4.7-Flash API directly from z.ai, finding it "pretty good, around Gemma 3 27b level or even better," but hit rate limits when doing word lookups on top of subtitle translations.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also