LamBench: A Lambda Calculus Benchmark Suite for AI Coding Agents

Victor Taelin released LamBench v1, a benchmark framework designed to test AI coding agents on lambda calculus problems. The project is hosted on GitHub at github.com/VictorTaelin/LamBench and includes a live site at victortaelin.github.io/lambench/.
Key Details
- Metrics: The benchmark measures three axes:
:intelligence,:speed, and:elegance. - Components: A set of
:problemsand a:matrixfor scoring results. - Version: v1 (initial release).
LamBench is part of a broader effort by Taelin to create rigorous evaluations for AI systems in symbolic computation. For context, lambda calculus is a formal system in mathematical logic and computing, often used to test reasoning and functional programming capabilities — making this benchmark particularly relevant for AI coding agents that need to handle symbolic manipulation, recursion, and higher-order functions.
Who It's For
AI researchers and developers building or evaluating coding agents, especially those working with functional programming or symbolic reasoning tasks.
📖 Read the full source: HN AI Agents
👀 See Also

MiniMax Music 2.5 AI Music Generator Released with Studio-Grade Audio Control
MiniMax Music 2.5 is an AI music generation model that creates studio-quality songs with 44.1kHz Hi-Fi output, 100+ instruments, and paragraph-level precision control using 14+ structural tags for directing song structure.

W2A — an open protocol for agent sensors: giving local agents real-time perception
W2A (World2Agent) is an open protocol standardizing the perception layer for AI agents — self-hostable, TS SDK, Apache 2.0. It lets agents receive real-time signals from sensors without one-off scripts.

Claudetop: Real-Time Cost Monitoring for Claude Code Sessions
Claudetop is an htop-like tool that shows real-time spending, cache efficiency, and model comparisons for Claude Code sessions. It provides slash commands like /claudetop:stats and smart alerts for cost milestones and efficiency issues.

Mnemos: an MCP server for persistent Claude Code memory
Mnemos is an open-source MCP server that gives Claude Code persistent memory across sessions, recording corrections as structured patterns and pushing ranked context at startup. Single 15 MB Go binary, no Docker or vector DB needed.