The Brief, Tuesday, May 26, 2026

PromptArmor published a working file-exfiltration exploit against Microsoft Copilot Cowork this week, and the full writeup is worth reading as a threat-modeling template. Malicious content embedded in a document instructs Copilot to send file contents to an attacker-controlled endpoint. No click, no confirmation, no user interaction beyond opening the file. The useful output here is the attack surface definition: any AI feature combining file access, external content ingestion, and outbound request capability. That triad describes most enterprise RAG deployments shipping today. It probably describes something you shipped.

Long-Context Inference Is Getting Cheaper, Not Yet Here

A cluster of research this week points the same direction: serving long context windows is getting dramatically cheaper, but the work is still research, not production. Worth tracking the trajectory; nothing to deploy yet.

NuExtract3: Document Parsing Without the API Bill

NuMind released NuExtract3, a 4B vision-language model for structured JSON extraction from PDFs, images, and OCR'd documents. Self-hostable, open-weight, purpose-built for the document parsing pipeline you are currently running through GPT-4V at API rates. At 4B parameters it runs on modest hardware. The operator calculus is straightforward: if your invoice, form, or report extraction volume makes API costs visible on the P&L, benchmark this before your next billing cycle.

Qwen3.6 Consensus Is Solidifying

Qwen3.6 35B A3B has held the top of r/LocalLLaMA for five consecutive days. Community discussion has moved past initial benchmarks into production reports. The signal here is durability. Models flash through the subreddit constantly. Five days of sustained, increasingly specific discussion with source counts rising day over day suggests this one has legs as the default local agent backbone.

On the Radar

Anthropic published an open-source repo of knowledge-worker plugins for Claude Cowork on the same day the Copilot file-exfiltration vulnerability dropped. Timing is coincidence, but the juxtaposition is instructive: if you are building on Cowork's plugin API, study the intended security model before deploying. Separately, cmux is an early-stage macOS terminal built around treating AI coding agents as first-class session types, with vertical tabs and per-agent notifications. And Xiaomi iterated on MiMo with a V2.5 coder variant. No benchmarks yet. Worth watching for community evals this week.

The through-line this week is infrastructure assumptions getting tested. The prompt injection triad. The memory wall for long-context serving. The API cost structure for document parsing. Each of these looked settled a quarter ago. None of them are settled now.

The Triad You Already Shipped

Long-Context Inference Is Getting Cheaper, Not Yet Here

NuExtract3: Document Parsing Without the API Bill

Qwen3.6 Consensus Is Solidifying

On the Radar