The 100,000 Whys of AI: How Quasi-Deterministic LLM Output Creates Telltale Slop

In a recent Substack post, lcamtuf (the security researcher known for AFL and other tools) tackles a recurring debate: whether you can distinguish human-written text from LLM output. His argument is grounded in a concrete observation about how current models behave in practice.
The Core Claim: Quasi-Determinism
LLMs are state-of-the-art statistical models of human language. In theory, their output should be indistinguishable from human text under any statistical test. But lcamtuf argues that the real distinguishing feature is quasi-determinism: give a hundred 'authors' a similar prompt — say, 'generate a reference book for children' — and the model will produce functionally identical output about 80% of the time.
He illustrates this with a collage of ~220 Amazon book covers from a search for '100000 whys' (link). The image shows clusters of nearly identical covers:
- The top two rows all feature a roaring T-Rex on the left
- Recurring motifs: red-and-white cartoon rocket, golden retriever, lion
- Author names include an improbable number of 'Brights': Ethan, Nolan, Pamela, Daniel, Thomas, Andrew W., Mayan, Mary, Levi — all Bright
Why This Matters for Developers
For teams shipping AI-generated content or building on LLM APIs, the implication is that you can't rely on randomness to mask AI origins. The statistical signature isn't about individual word choices — it's about the model returning the same high-level response structure to similar prompts. If your workflow involves generating many variations from similar prompts, the output will cluster, making it easy to spot.
lcamtuf notes: 'This is a fuzzy signal, so you shouldn't fire your intern when they say "it's not this — it's that". But in more casual settings, it's OK to trust your gut.'
Practical Takeaway
If you're using an LLM to automate blogging, be aware that your content may end up looking exactly like everyone else's. The post's P.S. is blunt: 'yes, the tech is amazing, but chances are, your publication could be renamed to "100,000 Whys".'
The post also links to examples beyond this single title (more examples) and notes that the original 'One Hundred Thousand Whys' is a 1929 Soviet children's book popular in China, which likely seeded the prompt term.
📖 Read the full source: HN LLM Tools
👀 See Also

Codex Converses: OpenClaw's Successor in AI Automation
Codex can now communicate with itself, heralding a new era in AI-driven automation and effectively replacing OpenClaw, the previous frontrunner.

Claude-Code v2.1.97 Release: NO_FLICKER Improvements, Permission Fixes, and MCP Updates
Claude-Code v2.1.97 adds a focus view toggle (Ctrl+O) in NO_FLICKER mode, fixes multiple permission and MCP connection issues, and improves sandbox network access. The release addresses 429 retry behavior, transcript persistence problems, and various UI bugs.

Claude Desktop v1.1.5749 Adds Computer Control and Corporate Proxy Fixes
Claude Desktop v1.1.5749 introduces computer use capability with MCP server for desktop control, adds six macOS TCC permission management methods, and fixes corporate proxy SSL certificate issues by forwarding NODE_EXTRA_CA_CERTS, SSL_CERT_FILE, and SSL_CERT_DIR environment variables.

Grammar-Based Method Matches or Outperforms AI in Authorship Analysis
A University of Manchester study found that LambdaG, a grammar-based authorship analysis method, matched or exceeded leading AI systems across most test datasets while offering greater transparency and lower computational cost.