Modified vLLM 0.17.0 runs on Tesla P40 for real-time transcription with Qwen3 ASR 1.7B

✍️ OpenClawRadar📅 Published: March 9, 2026🔗 Source
Modified vLLM 0.17.0 runs on Tesla P40 for real-time transcription with Qwen3 ASR 1.7B
Ad

A developer has successfully modified vLLM 0.17.0 to run on Tesla P40 GPUs, enabling real-time lecture transcription with the Qwen3 ASR 1.7B model. The P40 uses the Pascal architecture, which typically lacks support for newer inference engines.

Key Details

The developer was working on a personal project for real-time lecture transcription. They initially planned to use the Qwen3 ASR 1.7B model but found that true real-time transcription is only supported through vLLM. Instead of chunking audio samples as an alternative, they attempted an experimental modification.

Using Codex, they modified vLLM to run on the Pascal architecture. This allowed them to run the Qwen3 ASR 1.7B model on their Tesla P40 server GPU. The result was near-complete hardware acceleration and fully real-time transcription.

The modified vLLM fork is available at: https://github.com/uaysk/vllm-pascal

Ad

Next Steps and Challenges

The developer's next goal is to try running Qwen3.5 models on this setup. However, they note several technical issues. The vision functionality appears to be unavailable, and even using only the text capabilities presents challenges. At this point, they are unsure whether it will be possible.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also