Anthropic quietly changed a number that might quietly change your bill.
Top Signal
Anthropic Silently Cut Claude Cache TTL from 1 Hour to 5 Minutes
platform change
HN Front Page
A GitHub issue surfaced that Anthropic reduced prompt cache TTL from 60 minutes to 5 minutes on March 6th with no announcement. For builders running agentic loops or multi-turn conversations that rely on prompt caching to keep costs manageable, this means cached prefixes expire 12x faster — sharply increasing cache misses and API spend. The issue has 104 HN points and growing frustration. If you're building on the Claude API: audit your caching assumptions immediately. Long-running agent sessions that assumed hour-long cache windows are now paying full input token prices on most turns. Consider restructuring prompts to front-load stable content, batching requests within 5-minute windows, or adding client-side caching layers. This is a material cost increase disguised as a configuration change.
Read more →
Fast Signals
DFlash Speculative Decoding Hits 85 tok/s on Apple Silicon
workflow
r/LocalLLaMA
A new speculative decoding implementation achieves 3.3x speedup running Qwen3.5-9B on M5 Max via MLX. If you're building local-first AI products on Apple hardware, this closes a real gap between cloud and edge inference speed. Worth benchmarking against your current MLX setup.
Link →
NVIDIA AITune Auto-Selects Fastest Inference Backend for PyTorch
new tool
r/LocalLLaMA
NVIDIA released AITune, which benchmarks your specific model against available backends (TensorRT, torch.compile, etc.) and picks the fastest path automatically. Eliminates the manual trial-and-error of inference optimization. If you deploy PyTorch models on NVIDIA hardware, this replaces hours of backend configuration with a single call.
Link →
Berkeley Researchers Broke Top AI Agent Benchmarks — Here's How
research to practice
HN Front Page
UC Berkeley's RDI group demonstrated that leading agent benchmarks (SWE-bench, WebArena, etc.) are fragile and gameable. Minor prompt engineering and environment-specific tricks inflated scores without genuine capability improvement. If you're evaluating agent frameworks, don't trust benchmark leaderboards — build your own eval suite against your actual use cases.
Link →
MiniMax M2.7 Drops — 754B Parameters, Restrictive License
emerging signal
r/LocalLLaMA
MiniMax released M2.7, a massive 754B-parameter model with strong benchmarks. But the license bans commercial use without written permission, covering paid services, APIs, and fine-tuned deployments. GGUF quants are already available from Q2 to BF16. Evaluate for research only — the license makes this DOA for production builders.
Link →
SQLite 3.53.0 Ships ALTER TABLE ADD/DROP COLUMN, QRF Rendering
platform change
Simon Willison
Major SQLite release (3.52 was withdrawn, so this is a double batch). ALTER TABLE can now add and drop columns properly, and a new query result formatter (QRF) provides built-in rendering options. If SQLite is your app database, the ALTER TABLE improvements remove a long-standing pain point for schema migrations.
Link →
Radar
Meta's Neural Computers: Video Models as OS Simulators
Meta published research training video models to generate realistic terminal and desktop simulations. Early results are rough, but the direction — AI that can visually simulate entire computing environments — has implications for agent testing and synthetic training data.
Link →
Qwen 3.5 Weight Drift Fix Tool
A community-built automated tool detects and corrects weight drift in Qwen 3.5 models after extended fine-tuning. NIAH results are inconclusive but the tool itself addresses a real problem builders hit when fine-tuning large models locally.
Link →
Practical Open-Source Agent Stack for 2026
An r/LocalLLaMA post cuts through hype to document what actually works: Hermes Agent (24k stars, MIT), local SQLite memory, Telegram/Discord connectors. Worth bookmarking if you're evaluating self-hosted agent frameworks beyond the usual LangChain/CrewAI options.
Link →
Convergence Watch
agent management platforms
TRENDING
4 mentions across GitHub Trending, HN Show, r/LocalLLaMA
Multica, Rowboat, and Eve all shipped this week alongside community discussions of production agent stacks. The 'managed agent layer' pattern — assign tasks, track progress, persistent memory — is crystallizing into a distinct product category. This was a signal yesterday and is accelerating.
supply chain compromise
TRENDING
3 mentions across HN Front Page, Simon Willison
JSON Formatter Chrome extension turned adware, CPU-Z/HWMonitor site hijacked — joining last week's triple compromise. Developer tool supply chains remain under sustained attack. Audit your browser extensions and downloaded binaries. This is now a persistent threat pattern, not isolated incidents.
minimax m2.7
7 mentions across r/LocalLLaMA
Seven separate posts in 12 hours across release announcements, GGUF quants, and license criticism. High community interest but the restrictive commercial license dampens builder utility significantly. Watch for license clarification from MiniMax.