Local Translation Model Recommendations for 32GB VRAM GPUs

A developer with a 32GB VRAM GPU setup (specifically mentioning a 5090) shared practical findings on local translation models optimized for real-time subtitle and word/phrase translation. Their primary language pairs are Swedish-English and Korean-English.
Recommended Models
Based on testing for quality and speed:
- For overall languages: Unsloth Gemma3 27b Instruct UD, Q6_K_XL
- For European languages + 11 included (Korean among others): Bartowski Utter Project EuroLLM 22B Instruct 2512, Q8_0
The developer noted these outperformed previous go-to models: Magistral Small 2509 Q8, Gemma 3 27b Q4, Mistral Small 3.2 Q6_K, and GPT_OSS 20b (in that order).
Performance Notes
With these models, they achieved:
- Subtitle translations with little to no buffering
- Word-lookup translations within 0-2 seconds
Models That Were Too Slow
- Qwen3.5 27b Q6
- HyperCLOVAX SEED Think 32B Q6 (for Korean)
- Qwen3 32b Q6 (among other Qwen3-3.5 variants)
- Viking 33b I1 Q4_K_S
Other Observations
The developer mentioned TranslateGemma models, which they report are "significantly better according to Google than Gemma3 27b at translation," but noted these use user-user prompts rather than system-user format. They haven't tried them firsthand due to this format difference.
For Swedish translation specifically, GPT SW3 20b was noted as "good when it works, which is rarely (refuses to accept my system prompt)."
The developer also mentioned switching to trial Gemini 2.5 Flash and Gemini 2.5 Flash-lite not because local translations were bad, but because they were "still noticing some mistakes." They're debating between Deepseek, OpenAI, Gemini, z.AI, and Claude for cheap translations, with ChatGPT Thinking as their quality bar.
They noted some free API key options via: NVIDIA NIM, Routeway, Kilo, OpenCode, and Puter.js, though they haven't tried them. They did test GLM-4.7-Flash API directly from z.ai, finding it "pretty good, around Gemma 3 27b level or even better," but hit rate limits when doing word lookups on top of subtitle translations.
📖 Read the full source: r/LocalLLaMA
👀 See Also

iOS Shortcut Workaround for Sending iPhone Photos to Cowork via iCloud Sync
A developer created an iOS Shortcut called "PhoPo" that converts iPhone photos to JPEG, resizes them, and saves them to an iCloud-synced folder that Cowork can access, enabling Claude to analyze screenshots and photos from mobile devices.

Using Claude to analyze writing patterns for better custom instructions
A Reddit user describes a method for creating more effective custom instructions by having Claude analyze 10 writing samples to identify concrete patterns like punctuation avoidance and analogy sources, rather than relying on subjective tone descriptions.

Interactive Explainer Maps Claude Code Agent Loop Designs, from Single Calls to Self-Mutating Prompts
An interactive site built with Opus 4.7 visualizes 11 real agent loop designs for Claude Code, from basic calls to agents that rewrite their own prompts, with SVG animations showing memory and loop mechanics.

Claude Code Structure That Survived Multiple Real Projects
A developer shares a Claude Code setup that held up across 2-3 real projects with multiple skills, MCP servers, and agents. Key findings include using CLAUDE MD for consistency, splitting skills by intent, implementing hooks, and keeping context usage under 60%.