Modified vLLM 0.17.0 runs on Tesla P40 for real-time transcription with Qwen3 ASR 1.7B

A developer has successfully modified vLLM 0.17.0 to run on Tesla P40 GPUs, enabling real-time lecture transcription with the Qwen3 ASR 1.7B model. The P40 uses the Pascal architecture, which typically lacks support for newer inference engines.
Key Details
The developer was working on a personal project for real-time lecture transcription. They initially planned to use the Qwen3 ASR 1.7B model but found that true real-time transcription is only supported through vLLM. Instead of chunking audio samples as an alternative, they attempted an experimental modification.
Using Codex, they modified vLLM to run on the Pascal architecture. This allowed them to run the Qwen3 ASR 1.7B model on their Tesla P40 server GPU. The result was near-complete hardware acceleration and fully real-time transcription.
The modified vLLM fork is available at: https://github.com/uaysk/vllm-pascal
Next Steps and Challenges
The developer's next goal is to try running Qwen3.5 models on this setup. However, they note several technical issues. The vision functionality appears to be unavailable, and even using only the text capabilities presents challenges. At this point, they are unsure whether it will be possible.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claude Code Designs Printable Business Cards via HTML + Playwright
A user automated business card design by feeding Claude a cat photo and a website link, iterating with Playwright screenshots until perfect, then printing on Avery card stock via a 2x5 grid HTML template.

Using OpenClaw on Raspberry Pi as an AI hardware lab for device management
A developer runs OpenClaw on a dedicated Raspberry Pi to manage hardware devices through Discord, handling firmware flashing, troubleshooting, and system operations via subagents with guardrails like backups and rollback paths.

AI YouTube Creator Reports Monetization Earnings and Workflow Shift
A developer using Claude Opus 4.6 for scripting reported earning $12.20 from 28,400 views on their AI-generated YouTube channel, prompting a shift toward freelance content creation for businesses.

Building a Productive Autonomous ML Research System with Claude Code
A developer built a system where Claude Code acts as an autonomous ML researcher on tabular data, running experiments overnight with constrained file editing and Docker sandboxing. Key learnings include locking down editable files, protecting experiment throughput with limits, and implementing persistent memory through structured logging.