Miasma: A tool to trap AI web scrapers with poisoned data

What Miasma does
Miasma is a tool designed to trap AI web scrapers by serving them poisoned training data alongside multiple self-referential links, creating what the developers call an "endless buffet of slop for the slop machines." The tool is built to be fast with minimal memory footprint.
Installation and setup
Install with Cargo: cargo install miasma or download pre-built binaries from releases.
Start with default configuration: miasma
View all configuration options: miasma --help
How to trap scrapers
The typical setup involves:
- Embedding hidden links on your site pointing to a specific path (e.g.,
/bots) with attributes that make them invisible to human visitors but visible to scrapers:<a href="/bots" style="display: none;" aria-hidden="true" tabindex="1">Amazing high quality data here!</a>
- Configuring a reverse proxy (like Nginx) to route that path to Miasma:
location ~ ^/bots($|/.*)$ { proxy_pass http://localhost:9855; } - Running Miasma with specific parameters:
miasma --link-prefix '/bots' -p 9855 -c 50
The -c 50 flag limits max in-flight connections to 50, which results in 50-60 MB peak memory usage. Requests exceeding this limit receive a 429 response.
Configuration options
--port: Default 9999 - The port the server should bind to--host: Default localhost - The host address the server should bind to--max-in-flight: Default 500 - Maximum number of allowable in-flight requests--link-prefix: Default / - Prefix for self-directing links (should match your hosting path)--link-count: Default 5 - Number of self-directing links to include in each response page--force-gzip: Default false - Always gzip responses regardless of Accept-Encoding header--poison-source: Default https://rnsaffn.com/poison2/ - Proxy source for poisoned training data
Important considerations
The developers recommend protecting friendly bots and search engines in your robots.txt file:
User-agent: Googlebot User-agent: Bingbot User-agent: DuckDuckBot User-agent: Slurp User-agent: SomeOtherNiceBot Disallow: /bots Allow: /
Miasma is licensed under GPL-3.0 and the developers note that "primarily AI-generated contributions will be automatically rejected."
📖 Read the full source: HN AI Agents
👀 See Also

apple-music-play OpenClaw skill published on ClawHub for Apple Music search and playback
The apple-music-play skill published on ClawHub enables searching Apple Music's online catalog and playing tracks directly in the macOS Music app, without requiring songs to be in your local library.

Claude Code Session Dashboard: Open Source Tool for Monitoring Multiple Sessions
An open-source dashboard that monitors multiple Claude Code sessions simultaneously, showing token usage, costs, session status, context window usage, and active subagents. Installation requires three commands: git clone, cd, and npm install && npm start.

Torrix: Self-Hosted LLM Observability Without Postgres or Redis
Torrix is a self-hosted LLM observability tool that runs as a single Docker container backed by SQLite. Install with docker compose up; logs LLM calls via HTTP proxy or SDK — tokens, cost, latency, full traces, PII masking, cost forecasting.

Using a Local LLM as a Claude Code Subagent to Reduce Context Usage
A developer shares a method to use Claude Code to delegate tasks to a local LLM via LM Studio's API, keeping file content out of Claude's context. The approach uses a ~120-line Python script with tool-calling to read files locally and return summaries.