Miasma: A tool to trap AI web scrapers with poisoned data

✍️ OpenClawRadar📅 Published: March 29, 2026🔗 Source
Miasma: A tool to trap AI web scrapers with poisoned data
Ad

What Miasma does

Miasma is a tool designed to trap AI web scrapers by serving them poisoned training data alongside multiple self-referential links, creating what the developers call an "endless buffet of slop for the slop machines." The tool is built to be fast with minimal memory footprint.

Installation and setup

Install with Cargo: cargo install miasma or download pre-built binaries from releases.

Start with default configuration: miasma

View all configuration options: miasma --help

How to trap scrapers

The typical setup involves:

  1. Embedding hidden links on your site pointing to a specific path (e.g., /bots) with attributes that make them invisible to human visitors but visible to scrapers:
    <a href="/bots" style="display: none;" aria-hidden="true" tabindex="1">Amazing high quality data here!</a>
  2. Configuring a reverse proxy (like Nginx) to route that path to Miasma:
    location ~ ^/bots($|/.*)$ {
      proxy_pass http://localhost:9855;
    }
  3. Running Miasma with specific parameters:
    miasma --link-prefix '/bots' -p 9855 -c 50

The -c 50 flag limits max in-flight connections to 50, which results in 50-60 MB peak memory usage. Requests exceeding this limit receive a 429 response.

Ad

Configuration options

  • --port: Default 9999 - The port the server should bind to
  • --host: Default localhost - The host address the server should bind to
  • --max-in-flight: Default 500 - Maximum number of allowable in-flight requests
  • --link-prefix: Default / - Prefix for self-directing links (should match your hosting path)
  • --link-count: Default 5 - Number of self-directing links to include in each response page
  • --force-gzip: Default false - Always gzip responses regardless of Accept-Encoding header
  • --poison-source: Default https://rnsaffn.com/poison2/ - Proxy source for poisoned training data

Important considerations

The developers recommend protecting friendly bots and search engines in your robots.txt file:

User-agent: Googlebot
User-agent: Bingbot
User-agent: DuckDuckBot
User-agent: Slurp
User-agent: SomeOtherNiceBot
Disallow: /bots
Allow: /

Miasma is licensed under GPL-3.0 and the developers note that "primarily AI-generated contributions will be automatically rejected."

📖 Read the full source: HN AI Agents

Ad

👀 See Also