Modified vLLM 0.17.0 runs on Tesla P40 for real-time transcription with Qwen3 ASR 1.7B

A developer has successfully modified vLLM 0.17.0 to run on Tesla P40 GPUs, enabling real-time lecture transcription with the Qwen3 ASR 1.7B model. The P40 uses the Pascal architecture, which typically lacks support for newer inference engines.
Key Details
The developer was working on a personal project for real-time lecture transcription. They initially planned to use the Qwen3 ASR 1.7B model but found that true real-time transcription is only supported through vLLM. Instead of chunking audio samples as an alternative, they attempted an experimental modification.
Using Codex, they modified vLLM to run on the Pascal architecture. This allowed them to run the Qwen3 ASR 1.7B model on their Tesla P40 server GPU. The result was near-complete hardware acceleration and fully real-time transcription.
The modified vLLM fork is available at: https://github.com/uaysk/vllm-pascal
Next Steps and Challenges
The developer's next goal is to try running Qwen3.5 models on this setup. However, they note several technical issues. The vision functionality appears to be unavailable, and even using only the text capabilities presents challenges. At this point, they are unsure whether it will be possible.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Using OpenClaw with AI video tools to scale short-form content creation
A developer shares their workflow using OpenClaw to find content angles and hooks, then pairing it with an AI video tool to create and batch-post Shorts, Reels, and TikToks, resulting in consistent affiliate clicks and platform payouts.

Developer Builds Browser RPG in 9 Days Using Claude Code and Godot
A developer created 'Civic Nightmare,' a satirical browser RPG, in 9 days using Godot and Claude Code as part of a multi-tool AI workflow. This was their first time using the Godot engine.

Using Claude Code to Build a Satellite Image Analysis Pipeline for Retail Predictions
A developer used Claude Code to build a complete satellite imagery analysis pipeline that pulls Sentinel-2 optical and Sentinel-1 radar data via Google Earth Engine, processes parking lot boundaries from OpenStreetMap, and calculates occupancy metrics to predict retail earnings outcomes.

Debugging a Tiny AI Agent on an Old Nokia Phone: 18 Attempts to Success
A developer documented 18 failed attempts to run Picobot, a ~12 MB AI agent, on an old Nokia phone via Termux, testing free models, OpenRouter, and Groq before settling on Google's Gemini Flash API for a fast, reliable setup.