Local Book Translation Pipeline Uses Qwen 32B and Mistral 24B with Contextual RAG

✍️ OpenClawRadar📅 Published: April 1, 2026🔗 Source
Local Book Translation Pipeline Uses Qwen 32B and Mistral 24B with Contextual RAG
Ad

A developer has created a fully local, automated book translation pipeline that converts PDF files to ePub format using eight Python scripts. The system addresses common translation issues like context loss and formatting problems through a multi-step workflow.

Workflow Details

The pipeline consists of eight scripts that handle the entire process:

  • PDF Extraction: Uses Marker to extract content from PDFs while preserving formatting elements like bold text, chapters, and images
  • Text Segmentation: Splits the extracted text into manageable chunks
  • Context Creation: Before translation, sends excerpts from throughout the book to Qwen 32B to generate a "Super Bible" - a global glossary containing characters, tone, and atmosphere
  • Translation: Qwen 32B translates each text segment while referencing the Super Bible to maintain consistency
  • Style Editing: Mistral 24B acts as an editor, reviewing Qwen's translations and rewriting them for perfect literary style
  • Assembly: A final script reassembles all translated segments, reinserts images, and uses Pandoc to output a polished ePub file
Ad

Automation Features

The system includes a monitoring script that watches a designated folder. Users simply drop a PDF into this folder, and the pipeline automatically processes it. After several hours, the system outputs both the translated ePub and a receipt showing processing time.

The developer notes the results are surprisingly effective, though not 100% perfect, and mentions having several improvement ideas. The entire system runs locally on a personal computer without requiring external services.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also