Deterministic vs. Probabilistic Code Generation

Noah Hall, writing for The Tech Enabler, draws a sharp line between deterministic and probabilistic code generation. He uses Bun's recent vibe-coded conversion of a million-line codebase from Zig to Rust as a cautionary tale. His core argument: deterministic systems produce consistent, reviewable results; LLMs introduce uncertainty that makes code review impossible at scale.

Deterministic Code Generation

Hall points to established deterministic tooling: Python's 2to3 for Python 2→3 migration, and transpilers for languages like Elm, PureScript, and TypeScript that always produce the same JavaScript. His own language Derw can output JavaScript, TypeScript, or English; Tegan outputs JavaScript or Go; Mojie targets JavaScript, Python, or English. All are based on AST-to-AST transformation — given the same input, you always get the same output. Consistency matters: "If a bug is consistent, we can fix it. If a bug is inconsistent, it becomes exponentially more difficult to fix."

Probabilistic Code Generation

LLMs vary output each run — sometimes A, sometimes B. Hall created neuro-lingo three years ago as a parody: humans write only function signatures and comments, and LLMs generate the implementation fresh each compilation. An example:

function add(a: number, b: number): number {
  // Add two numbers together
}
function main() {
  // Print "Hello World" to the console
  // Print the result of add(2, 3)
}

"Every time neuro-lingo is compiled, the code is generated from fresh by the LLMs. It's slightly different each time. Sometimes it introduces bugs. Sometimes it's clean and simple. Sometimes it's chaotic." Hall argues that fully AI-driven code flows are doing exactly this, but shipping to production with human accountability.

The "There Are Tests" Fallacy

Tests alone can't guarantee quality. Hall cites SQLite as the most tested codebase: 155.8 KSLOC of C code vs. 92,053.1 KSLOC of test code (590× more). Despite 100% branch coverage, millions of test cases, and extensive harnesses, SQLite still relies on human review. "It is not possible for a human to review 1 million lines of changes in 9 days. Bun has not reviewed the code they have merged to master."

Hall concludes that deterministic code generation still needs validation, and probabilistic generation creates risk that scales with line count. The source article goes deeper on each example.

📖 Read the full source: HN AI Agents

Deterministic vs Probabilistic Code Generation: Why Bun's Vibe-Coded Rust Conversion Raises Red Flags

Deterministic Code Generation

Probabilistic Code Generation

The "There Are Tests" Fallacy

👀 See Also

Fine-tuned Qwen3 Small Models Outperform Frontier LLMs on Specific Tasks at Lower Cost

ThinkPad's 34-Year Run: From IBM 700C to Lenovo AI Workstations

Claude Connection Failures for Organizations Blocking GitHub by IP Address

Claude Opus 4.7 Model Card Released