AgentCrawl Update Adds Critical Crawler Features and Enhancements

The latest update to AgentCrawl enhances its functionality as a TypeScript scraper/crawler, introducing several important features for developers using AI agents. This release focuses on production-readiness by integrating crawler correctness and politeness, caching mechanisms, resumable crawls, and enhanced data extraction capabilities.
Key Details
- Removed Tool Adapters: The update eliminates the tool adapters for the agents SDK and Vercel AI SDK, allowing users to define their tools independently.
- Updated Libraries: The package now includes the latest version of Zod for better data validation.
- Crawler Correctness: Robots.txt compliance is now opt-in and supports Disallow/Allow and Crawl-delay directives. Opt-in sitemap seeding from
/sitemap.xmlis also available. - URL Normalization: Improved URL normalization comprehensively strips tracking parameters and can handle canonical normalization.
- Throttling Options: The crawler supports per-host throttling with configurable
perHostConcurrencyandminDelayMs. - Caching: An opt-in disk HTTP cache for static fetches implements ETag and Last-Modified support. The system caches post-cleaning and markdown conversion of
ScrapedPageand can handle server responses with status 304 by serving cached bodies. - Resumable Crawls: A new opt-in crawlState persistence saves the crawl's frontier, including the queue, visited pages, queued items, errors, and max depth, which allows for resumable crawls without re-visiting pages.
- Data Extraction Improvements: The scraper now supports structured metadata extraction, including Canonical URL, OpenGraph, Twitter cards, and JSON-LD, kept in
metadata.structured. - Chunking for Agents: Opt-in chunking functionality returns
page.chunks[]with an approximate token size, heading path, and citation anchor, which is beneficial for RAG/tool loops.
Who It's For
This update is particularly beneficial for developers utilizing AI agents requiring efficient and structured web scraping capabilities.
📖 Read the full source: r/LocalLLaMA
👀 See Also

OpenClaw-Superpowers: A Native Port of Jesse Vincent's Superpowers Framework Without Claude Code Dependency
A Reddit user ported obra/superpowers to OpenClaw with dedicated agents (coding orchestrator, implementer, reviewer) and native commands like sessions_spawn and update_plan, removing Claude Code dependency.

AlphaCreek: An MCP Server That Chunks SEC Filings to Cut Token Usage by 85%
AlphaCreek is a free MCP connector for Claude that reduces token consumption by ~85% when working with SEC filings by first returning a table of contents, then fetching only the sections the agent requests.

VTCode: A Rust TUI Coding Agent That Aggressively Trims Context with AST-Level Chunking
VTCode is an open-source Rust TUI coding agent that aggressively trims context using AST-level chunking via ripgrep and ast-grep. It supports custom OpenAI-compatible providers, sandboxing with macOS Seatbelt and Linux Landlock, and tree-sitter-bash validation on generated commands.

Running NemoClaw with Local vLLM: Setup Notes and Agent Engineering Observations
A developer documented running NVIDIA's NemoClaw sandboxed AI agent platform with a local Nemotron 9B v2 model via vLLM on WSL2. Key findings include inference routing details, parser compatibility issues, and observations about the agent engineering gap.