AIsbf 0.9.8 adds caching, routing improvements, and expanded AI service support

AIsbf (AI Should Be Free) 0.9.8 is an API proxy/router that provides an OpenAI-compatible interface to various AI endpoint services, aiming to make LLM usage more cost-effective. It's multiuser and can scale from small setups to large infrastructure.
Key features in version 0.9.8
- Cache support for Redis, SQLite, MySQL, and file-based storage
- Additional context condensation methods
- Native prompt caching and request caching support
- Faster and improved semantic prompt-based routing for automatic service selection
- Full OAuth2 support for Claude.ai subscribers
- Full OAuth2 support for Amazon Kiro-cli subscribers
- Full OAuth2 support for OpenAI Codex subscribers
- Full support for Kilo.ai subscribers using tokens or OAuth2
- Multiple bug fixes and minor feature additions
This type of tool is useful for developers who work with multiple AI services and want a unified interface while optimizing costs through intelligent routing and caching.
📖 Read the full source: r/LocalLLaMA
👀 See Also

altRAG: Replace Vector DB RAG with 2KB Pointer Files for AI Coding Agents
altRAG is a Python tool that replaces vector database RAG with lightweight pointer files. It scans Markdown/YAML skill files to create a 2KB skeleton file mapping sections to exact line numbers and byte offsets, allowing AI agents to read only needed sections instead of entire files.

OpenClaw CoreBrain Plugin: Persistent Memory for AI Coding Agents
A new plugin called CoreBrain addresses OpenClaw's memory issues by storing information outside the context window in a knowledge graph and auto-injecting it before every query, eliminating the need for tool calls and optional memory invocation.

Open-source tool enables Claude to control Unreal Engine directly
soft-ue-cli is a Python tool with a C++ plugin that allows Claude Code and Claude Desktop to execute commands in Unreal Engine without editor interaction, featuring 60+ operations including blueprint editing, actor spawning, and performance profiling.

Echo-TTS Ported to Apple Silicon with MLX for Native TTS with Voice Cloning
Echo-TTS, a 2.4B parameter diffusion text-to-speech model with voice cloning, has been ported from CUDA to run natively on Apple M-series silicon using MLX. On a base 16GB M4 Mac mini, a 5-second voice clone takes about 10 seconds to generate, while 30-second clones take about 60 seconds.