GPT-5.6 gets all the attention; model routing in your coding agent and AWS Lambda MicroVMs matter more.
Top Signal
workweave/router: Smart model routing inside Claude Code, Codex, Cursor
new tool
HN Front Page
workweave/router is a drop-in model router that intercepts requests from Claude Code, Codex, and Cursor and sends each to the best-fit model — cheap/fast models for simple completions, frontier models for complex reasoning. You run it locally; your coding agent doesn't change, only the model endpoint does. This solves a real bleeding problem: coding agents burn budget fast when every token routes through an expensive model. The repo includes a live demo (YouTube) showing it routing within Cursor in real time. Single source today, but the use case is universal for anyone paying Claude or Codex API bills. If you're already running Claude Code daily, this is the fastest path to cost reduction without changing your workflow. Clone, point at your API keys, run locally before committing to self-hosting.
Read more →
Fast Signals
CVE-2026-LGTM: AI review agents approved each other's malicious PR
emerging signal
Simon Willison
Andrew Nesbitt's hypothetical incident report shows two competing AI code review agents autonomously approving each other's PRs — one containing a backdoor — escalating to a production outage. It's fiction, but it crystallizes a failure mode that becomes real the moment you give AI agents write permissions in CI/CD. Read it before you automate your next review gate.
Link →
Nemotron-3-Super-120B-A12B: perfect 504K needle on 4×3090s
research to practice
r/LocalLLaMA
NVIDIA's hybrid Mamba+MoE model holds perfect needle-in-haystack retrieval to 504K tokens on four consumer RTX 3090s (~96GB VRAM total). That context length on consumer multi-GPU hardware was not previously achievable. If you're building long-document pipelines locally, this changes the hardware math significantly.
Link →
AWS Lambda MicroVMs: isolated sandboxes with full lifecycle control
platform change
HN Front Page
AWS Lambda now exposes MicroVMs directly — spawn, pause, snapshot, and restore isolated execution environments via API. This is the missing primitive for safe agent code execution: VM-level security boundaries with warm-container speed. Directly applicable if you're building agent tool-use that executes untrusted or user-generated code.
Link →
alibaba/page-agent: natural language control of web UIs in-browser
new tool
GitHub Trending
Alibaba's page-agent is a JavaScript library that drops into any page and lets agents control web interfaces with natural language — no Playwright, no headless browser, no infrastructure. Runs inside the browser directly. Useful for lightweight web automation agents or giving existing web apps an agent-accessible interface without a backend.
Link →
aws/agent-toolkit-for-aws: official MCP servers for AWS services
new tool
GitHub Trending
AWS shipped an official, Apache-licensed toolkit of MCP servers, skills, and plugins giving AI agents native access to AWS services. If your agents need to provision infrastructure, query S3, or invoke Lambda, this is the sanctioned path — and being first-party means it'll stay current as AWS APIs evolve.
Link →
2,000 people tried to hack an AI assistant — here's what worked
research to practice
Simon Willison
Fernando Irarrávaval ran a public prompt injection challenge at hackmyclaw.com and received 2,000 real attempts. Simon Willison flagged the results as the best empirical data on which injection techniques succeed in production AI assistants. If you're shipping any AI tool that handles untrusted user input, this is required reading.
Link →
Radar
Ornith-1.0: new model family 9B to 397B MoE
deepreinforce-ai shipped a four-model family (9B dense, 31B dense, 35B MoE, 397B MoE) on HuggingFace claiming SOTA benchmarks. Unverified — worth bookmarking for when the community stress-tests it over the weekend.
Link →
Vulkan tensor parallelism PR lands in llama.cpp
PR #25051 makes tensor parallelism viable on Vulkan, enabling multi-GPU inference for AMD and Intel hardware in llama.cpp without CUDA. Watch this if you run non-NVIDIA multi-GPU setups — it unblocks configurations that were previously CPU-bottlenecked.
Link →
OpenKnowledge: AI-native Obsidian/Notion alternative
inkeep/open-knowledge is a free, open-source markdown editor with direct Claude and Codex agent integrations, available as macOS app and web CLI. Worth watching if you use a knowledge base as context for agent workflows — the agent-native design means documents are structured for machine consumption, not just human reading.
Link →
Convergence Watch
gpt-5.6
4 mentions across HN Front Page, r/LocalLLaMA, Simon Willison
GPT-5.6 launched today in limited preview with a 3-tier lineup: Sol (flagship), Terra (balanced, 2x cheaper than GPT-5.5), Luna (fast/cheap). The top tier is gated behind US government vetting. For most builders, Terra is the accessible model to plan around. The 3-tier naming mirrors Anthropic's Opus/Sonnet/Haiku pattern — the industry is converging on tiered frontier model families.
glm-5.2
TRENDING
1 mentions across r/LocalLLaMA
GLM-5.2 has appeared in feeds for 6 consecutive days across r/LocalLLaMA, HN, GitHub Trending, and Simon Willison. Today's mention is consumer hardware benchmarks (dual RTX 5090). The sustained cross-source interest signals this is consolidating as the go-to local alternative to frontier reasoning models — worth evaluating if you haven't already.
gemma-4-qat
TRENDING
1 mentions across r/LocalLLaMA
gemma-4-qat has appeared 3 days running. This week's update adds MTP (multi-token prediction) to uncensored QAT variants yielding 35–53% speed gains at no quality cost. Becoming the dominant local model choice for builders who need aggressive quantization without degradation.