Kimi K2.7-Code: Open-Source Coding Model with Better Token Efficiency

Moonshot AI has released Kimi K2.7-Code, an open-source coding model available on Hugging Face under the moonshotai/Kimi-K2.7-Code namespace. The model is tagged as image-text-to-text and uses the Transformers library. It positions itself as a token-efficient alternative for code generation and understanding tasks.
Key Features
- Inference providers: Novita offers the model with live status, tool calling support (
toolCalling: true), and structured output currently unavailable. Throughput measured at 36.1 tokens/second. - Model architecture: The model comes in 64 shards (safetensors format:
model-00001-of-000064.safetensors). - Token efficiency: The model uses a custom chat template that preserves reasoning content (
preserve_thinking: true) and optimizes token usage by separating history and suffix messages. The template includes special tokens like<|im_user|>,<|im_assistant|>, and<|im_system|>for role management, and<think>/</think>blocks to encapsulate chain-of-thought reasoning. - Tool calling: Native support for tool calls with structured argument formatting, using
<|tool_call_begin|>and<|tool_call_end|>markers. - Community engagement: 334 likes on Hugging Face, with 4 HN comments and 41 points as of publication.
Practical Implications
The template design explicitly avoids embedding reasoning tokens in history when preserve_thinking is false, reducing context overhead. For developers using AI coding agents, this means lower token consumption per interaction — especially beneficial for long agentic loops where reasoning chains are repeated. The tool calling format is JSON-aligned, making it straightforward to integrate with existing function-calling pipelines.
The model is available for immediate use via Novita, and the Hugging Face repository includes full tokenizer config and template source.
📖 Read the full source: HN AI Agents
👀 See Also

Apple Using Google Gemini Access for On-Device AI Model Distillation
Apple has full access to Google's Gemini model for distillation, creating smaller on-device AI models for Siri and other features in iOS 27 without internet connectivity.

AI's PR Problem: Flat Wages, Soaring Capital, and Public Backlash
College wage premium flat for 25 years, S&P 500 up 380%. Workers see AI as theft enabler, leading to laws against data centers.

OpenClaw Hosts Its First AMA: Insights into AI Coding Agents
OpenClaw, a prominent figure in AI coding agents, hosted its first AMA on Reddit. The discussion shed light on its impacts, future plans, and challenges.

OpenClaw Founder Peter Steinberger on the Radar: YC Interview Insights
OpenClaw's founder, Peter Steinberger, catches the eye of YC, sparking discussions about the future of AI coding agents. Dive into the highlights of this significant chat that promises to influence the trajectory of automation and AI agent integration.