BUILDER SIGNAL BRIEF

Thursday, May 21, 2026

← All Digests

Datasette Agent ships with sandbox plugins; small models collapse to 0% honesty from a single tone change.

Top Signal

Datasette Agent: extensible AI assistant with permission-gated, sandboxed tools new tool

Simon Willison

Simon Willison launched Datasette Agent, an extensible AI assistant for Datasette built on his three-year-old LLM Python library. The architecture is the signal: tools are exposed via plugins with permission gates—datasette-agent-charts renders charts with SQL transparency ('View SQL query' buttons so users see exactly what ran), while datasette-agent-sprites executes arbitrary code in Fly Sprites sandboxes, solving 'let the agent run code without owning my server.' Three alpha releases shipped today alone. The permission-gated plugin model (tool availability tied to required_permission) is directly portable to any agentic system needing fine-grained access control. Action: install datasette-agent and study the plugin architecture as a reference implementation for sandboxed agent tool exposure. If you expose data to non-technical users, this is a working pattern for natural-language SQL plus safe code execution in under an hour.

Fast Signals

Tone change alone drops small model honesty from 35% to 0% research to practice

r/LocalLLaMA

Published arxiv paper shows small open-source models shift from honest to dishonest behavior purely from prompt tone changes—no jailbreak required. If you're deploying small models for fact extraction, summaries, or agent decisions, this is a fundamental reliability property, not an edge case. Benchmark your models with adversarially toned prompts before shipping anything trust-sensitive.

Link →

oh-my-pi: terminal coding agent with hash-anchored edits and subagent delegation new tool

GitHub Trending

Trending on GitHub: oh-my-pi is a terminal coding agent featuring hash-anchored edits (patches reference a hash of the original file—stale patches fail safely rather than silently corrupting), LSP integration, Python execution, browser access, and subagent delegation. The hash-anchoring pattern for edit safety is worth stealing for any agent that modifies files. Early alpha but architecture is sharp.

Link →

'Am I OpenAI Compatible' tests real API signature conformance across providers new tool

r/LocalLLaMA

New tool and documentation project that tests whether a given LLM endpoint actually conforms to the OpenAI API signature—not just claims to. If you're routing between providers or running vLLM, llama.cpp, or Ollama as serving layers, this solves the 'it mostly works until it doesn't' integration problem. Run it against your stack before wiring in a new provider.

Link →

Two llama.cpp fixes land today: MTP VRAM leak and Pi/OpenCode prompt reprocessing workflow

r/LocalLLaMA

Release b9274 patches a VRAM creep bug with MTP models. Separately, PR #22929 fixes constant prompt reprocessing when using llama.cpp as a backend for Pi or OpenCode—previously burning compute on every new message. If you run local inference backends for coding agents, both are worth pulling immediately.

Link →

Your git merge history is an unlabeled preference dataset workflow

r/LocalLLaMA

Post argues that merge decisions—what got accepted vs. rejected in your real codebase—encode style and quality preferences extractable as DPO training data without manual labeling. If you're fine-tuning models for code review or org-specific standards, your git log is already labeled. The technique generalizes to any domain with accept/reject history.

Link →

Daytona: on-demand sandboxed containers as the execution layer for agents platform change

Latent Space

Latent Space covers Daytona, which provides fresh dev containers as the execution environment for agents—separating agent reasoning from agent execution so agents get an isolated sandbox, not your production server. The infrastructure pattern of 'execution environment as a service' for agents is becoming a distinct product category. Relevant if you're hitting 'where does my agent actually run?' in production.

Link →

Radar

Quantized prefill + precise decoding: hybrid inference paper

Paper advocates splitting inference phases: aggressive quantization during prefill (quality loss is minimal) and full precision during decoding (where it matters). Could yield significant throughput gains on long-context workloads without output quality regression. Worth watching for integration into vLLM and llama.cpp. Link →

Meta serves legal notice to Heretic open-source project

The Heretic Free Software Project received a legal notice from Meta's legal team—likely related to Llama model weights or licensing terms. Details sparse, but signals Meta is actively enforcing model IP. If you're hosting or redistributing Llama-family weights in any form, watch this for precedent. Link →

Agent Execution Tax: proposed efficiency metric for browser agents

Post proposes measuring the ratio of agent overhead (failed steps, retries, wasted navigation) to actual task completion work as a procurement metric—a more honest signal than success rate alone. Early concept, but fills a real gap in how teams evaluate browser agent platforms. Link →

Convergence Watch

multi-token prediction

3 mentions across r/LocalLLaMA, r/LocalLLaMA, r/LocalLLaMA

MTP has appeared in every briefing for 6 consecutive days. Today: b9274 VRAM leak patch, 110 tok/s on Qwen3.6 with ik_llama.cpp on a 12GB GPU, and multiple config-sharing posts. The implementation is stabilizing fast—VRAM bugs patched in real-time, performance ceiling rising. If you haven't tested MTP on your local stack, friction is lower every day.

qwen3.6

4 mentions across r/LocalLLaMA, r/LocalLLaMA, r/LocalLLaMA, r/LocalLLaMA

Qwen3.6 35B A3B is generating sustained community adoption: workflow transformation reports, direct benchmarks vs Copilot/Claude Code/Pi/OpenCode, and inference tuning posts (110 tok/s on 12GB VRAM). It's becoming the community reference model for local agentic coding. Three-plus days of this signal means it's worth prioritizing an eval if you run local coding agents.