Oak rewrites version control for agents; a 0.2B model matches 10B inpainting; and GLM-5.2 keeps pulling.
Top Signal
Oak: Version control redesigned for autonomous agents
new tool
HN Front Page
Oak (oak.space) is a new VCS built around how agents actually interact with code, not how humans do. The key primitive: virtual mounts let an agent access any file in any repository without cloning the full history. For cloud-based agent pipelines, this eliminates the cold-start cost of cloning large repos on every task — a real friction point when you're running hundreds of parallel agents. Oak also exposes structured repo metadata that agents can query programmatically instead of parsing `git log` shell output. It's an early ShowHN (live code at the link), but the architectural insight is sound: agents don't navigate repos the way developers do, and git's human-optimized surface creates unnecessary overhead. If you're building multi-agent coding workflows or cloud-native code generation, start evaluating the virtual mount model now — this is the kind of primitive that becomes standard infrastructure.
Read more →
Fast Signals
Moebius: 0.2B inpainting model matches 10B-level quality
research to practice
HN Front Page, r/LocalLLaMA
HUST released Moebius, a 0.2B diffusion model for image inpainting that benchmarks at 10B-parameter quality by concentrating compute only on masked regions. Appeared independently on HN (213 points) and r/LocalLLaMA. If you're building image editing into a product and don't want to host a 10B+ model, this is the first thing to benchmark.
Link →
Claude Code's Extended Thinking output is not authentic inner monologue
platform change
HN Front Page
A detailed analysis argues the text shown in Claude Code's Extended Thinking mode is post-hoc rationalization, not the actual compute path. This directly affects anyone using extended thinking output as a debug signal or building explainability layers on top of it. Treat it as structured summarization, not a faithful trace of model reasoning.
Link →
Top-N-Sigma: New sampler PR replaces softmax+sort in llama.cpp
new tool
r/LocalLLaMA
PR #22645 introduces Top-N-Sigma sampling to llama.cpp — drops the unconditional softmax+sort and uses a standard-deviation threshold to filter the logit distribution instead. Early community tests show improved coherence on long outputs over top-p. Watch for the merge; worth A/B testing against your current sampler settings once it lands.
Link →
NEX-N2-mini fine-tune fixes Qwen3.5/3.6 overthinking
new tool
r/LocalLLaMA
A community fine-tune via the Apostate framework cuts Qwen3.5-MoE and Qwen3.6 reasoning verbosity significantly while preserving coding performance. If you're running Qwen3.5 or 3.6 locally and burning tokens on unnecessary chain-of-thought loops, this is a drop-in swap worth testing before switching model families.
Link →
Recall: Persistent project memory for Claude Code sessions
new tool
HN Show
Recall (ShowHN, 127 points, 81 comments) gives Claude Code persistent project memory across sessions — architecture decisions, naming conventions, recurring context — stored as structured files Claude Code picks up automatically. Directly useful today if you're doing heavy Claude Code work on large codebases where re-explaining context per session is the main friction.
Link →
System prompts from Claude, GPT-5.5, Gemini, Codex now indexed
emerging signal
GitHub Trending
A GitHub Trending repo aggregates extracted system prompts from Claude Fable 5, Opus 4.8, Claude Code, Claude Design, GPT-5.5 Thinking, Codex, Gemini 3.5 Flash, Grok, Cursor, Copilot, and Perplexity. High-signal reference for understanding how frontier labs structure tool-use, persona boundaries, and safety instructions — directly applicable to your own agent system prompt engineering.
Link →
TMax: Minimal architecture for high-performing terminal agents
research to practice
r/LocalLLaMA
TMax paper proposes constraining agent action spaces to terminal commands, rewarding completion density, and eliminating orchestration overhead — claiming this outperforms more complex multi-agent frameworks on coding benchmarks. Useful architectural reference if you're building CLI-native agents or trying to simplify an overly complex pipeline.
Link →
Radar
Ling & Ring 2.6: Trillion-param MoE built for agentic latency
Technical report dropped for Ling and Ring 2.6, a trillion-parameter MoE claiming 'instant agentic intelligence' via architectural choices that prioritize low first-token latency. Worth reading the report for the specific design decisions — efficient agentic inference at scale is an unsolved problem and this is one of the few public technical writeups on it.
Link →
vLLM 2-3x faster than llama.cpp on dual AMD R9700s
Side-by-side benchmark on dual Radeon R9700s shows vLLM delivering 2-3x higher throughput than llama.cpp with ROCm or Vulkan for Qwen3.6. If you're running AMD inference and defaulted to llama.cpp, you're leaving significant headroom on the table — worth re-benchmarking your stack.
Link →
Deno Desktop: Native desktop apps without Electron
Deno Desktop ships a way to package web apps as native desktop applications using Deno's Rust-based runtime — significantly lighter than Electron. Worth watching for teams building local AI tooling who want a distributable desktop app without the 200MB Electron tax.
Link →
Convergence Watch
glm-5.2
TRENDING
7 mentions across r/LocalLLaMA, HN Front Page, GitHub Trending, Simon Willison
GLM-5.2 has now appeared across 4 independent sources over 6 of the last 7 days. Today's community focus has shifted from evaluation to deployment: multi-GPU speed benchmarks (5090+3090Ti, 4x3090), DeepSWE leaderboard placement, and direct Claude Opus head-to-head comparisons. The community has moved from discovery to production adoption — this model is entering the mainstream local stack.
gemma-4-qat
3 mentions across r/LocalLLaMA
Two independent posts today (31B and 27B variants) confirm Gemma 4 QAT tolerates aggressive KV cache quantization far better than standard post-training quants — consistent with yesterday's finding. Practical implication: on memory-constrained hardware, Gemma 4 QAT lets you compress the KV cache harder without quality collapse, effectively extending usable context length on the same VRAM.