← Yesterday Archive All digests

FIELD DIGEST

The agent stack is decomposing: orchestration goes tiny, reliability goes visual, and QA chases the code AI writes.

Three tool launches this week point at the same structural shift. Agent orchestration no longer requires a full LLM. Agent control flow is moving from prompt chains to explicit state machines. And AI-generated code is producing failure patterns distinct enough to warrant its own lint category. The common thread: the first generation of agentic tooling assumed everything would run through a single large model. The second generation is decomposing that assumption into specialized layers.

This week's items

Needle makes agent orchestration pocketable (agents).

Cactus Compute open-sourced Needle, a 26M-parameter function-calling model distilled from Gemini. It runs at 6,000 tokens per second prefill on consumer devices, including budget phones. The model handles tool-use routing, deciding which function to call and extracting arguments, without a full LLM. Agentic workflows currently require cloud calls or heavy local models just for the dispatch step. Needle separates orchestration from reasoning entirely. Run dispatch on-device with near-zero latency, call larger models only when actual inference is needed. Apache 2.0 licensed. Edge agents, IoT automation, and mobile agentic apps just got a missing layer.

Statewright constrains agents to explicit state machines (agents).

Statewright defines agent control flow as visual state machines rather than prompt chains. The agent can only transition between explicitly defined states. This is a direct response to the brittle-agents problem: prompt-chained agents branch unpredictably because nothing constrains their state space. State machines make the failure modes enumerable. The approach trades flexibility for reliability, which is the correct trade for production agents that need to do the same multi-step task consistently. Built by a 20-year GPU architecture veteran from NVIDIA and AMD, which suggests hardware-discipline thinking applied to software agent design.

React Doctor lints what AI agents write (tooling).

Million.co released a static analysis tool designed specifically to catch the patterns AI coding agents produce in React code. This is a category emergence. AI-generated code produces failure patterns distinct enough from human-written code that it needs its own lint rules. The same dynamic played out with auto-generated code in earlier eras: ORM output, code generators, and template engines each eventually spawned specialized validation. React Doctor is one of the first tools to treat AI-authored code as a distinct dialect requiring its own quality gate. Expect this pattern to replicate across frameworks and languages.

AMD closes the local inference gap (inference).

The Luce inference engine now delivers 2.2x decode and 3x prefill speedup over llama.cpp on AMD Strix Halo integrated GPUs, running 27B-class models. AMD's integrated graphics are becoming a viable local inference platform, not a budget compromise. The cost calculus between discrete NVIDIA GPUs and AMD's integrated silicon is narrowing faster than most purchasing cycles account for. This is a platform-competition signal: the assumption that serious local AI workloads require NVIDIA discrete cards is being tested by AMD's integrated hardware getting fast enough to run production-scale models.

OpenHuman positions as private local-first AI (agents).

OpenHuman is a self-hosted personal AI system positioning itself as a private alternative to cloud assistants. The architecture promises local-first agent workflows beyond chat. Whether OpenHuman delivers on that promise is an open question, but the positioning reflects a real market signal alongside Needle and Statewright: the assumption that useful AI requires cloud infrastructure is being tested from multiple directions simultaneously. Privacy-sovereign AI assistants are becoming a product category, not a hobbyist ambition. The convergence of tiny orchestration models, local inference performance, and self-hosted agent frameworks is making the local-first stack plausible for the first time.

The monolithic cloud-LLM architecture is splitting into specialized layers: tiny models for dispatch, state machines for control flow, dedicated linters for output quality, and integrated GPUs for local compute. Each layer is becoming independently addressable. That is how platforms mature.