Agent sandboxing reaches infrastructure layer: 3 independent tools drop same day for isolating model-generated code.
Top Signal
Microsoft MXC: production-grade sandbox for running untrusted agent code
new tool
GitHub Trending
Microsoft shipped MXC (Microsoft eXecution Container) open-source — a policy-driven, layered sandboxed execution system built explicitly to run untrusted model output, plugins, and tools on Windows, Linux, and macOS. Unlike generic containers, MXC is designed for the agent use case: define policies that restrict what LLM-generated code can do, enforced via multiple containment backends. This fills a real gap in the agent builder stack — a battle-tested isolation layer that doesn't require rolling your own seccomp filters or spinning up throwaway VMs per inference call. Today it surfaced alongside micropython-wasm (Python sandboxing) and Kyushu (WASM JS sandboxing), three independent sources converging on the same problem in a single day. The agent sandboxing layer is hardening into standard infrastructure, not a per-project solve. Action: evaluate MXC as your containment layer for any pipeline that executes model-generated code in production.
Read more →
Fast Signals
llama.cpp Gemma4 MTP support merged — pull and run
platform change
r/LocalLLaMA
Multi-token prediction for Gemma 4 landed in llama.cpp mainline today. Combined with the QAT weights released last week, this is the exact stack that was hitting 120 tok/s on 12GB VRAM in benchmarks. Update llama.cpp and enable MTP now — no more patched builds required.
Link →
KVarN holds up across 75 Qwen 3.6 27B quant pairs
research to practice
r/LocalLLaMA
Comprehensive community benchmark across 75 precision pairs confirms KVarN's 1-bit advantage is model-agnostic: 6-bit KVarN matches standard q8_0, 4-bit matches q5_0 on Qwen 3.6 27B. Two consecutive days of cross-model validation — if you're running long-context inference, KVarN is the KV cache quant to adopt.
Link →
Kyushu: self-hostable WASM sandbox for JavaScript workers
new tool
HN Show
Show HN with 70 points: Kyushu runs JavaScript in a zero-dependency WASM sandbox for plugin execution and agent tool use. Complements micropython-wasm for JS-native stacks — a sandboxing option for agent builders who aren't on Python and don't want to run a full container per tool call.
Link →
Nemotron 3.5 ASR: 40+ languages, 4.5x realtime on CPU, dockerized
new tool
r/LocalLLaMA
Builder migrated from Parakeet and documented the results: better multilingual support across 40+ locales, streaming, and 4.5x realtime speed on CPU-only inference. If you're building voice pipelines, this is a direct drop-in comparison worth running.
Link →
Qwen 3.6 27B scores 2% on DeepSWE — local coding agent ceiling mapped
research to practice
r/LocalLLaMA
70-hour community benchmark places Qwen 3.6 27B at 2% on DeepSWE (18/20, above Haiku 4.5), averaging 32 minutes and 44k output tokens per task. Useful calibration: fully autonomous SWE remains frontier-only, but these cost and latency numbers are directly useful for scoping supervised local agent loops.
Link →
Jane Street: Claude Code replaced Figma for UI design work
workflow
HN Front Page
Jane Street practitioner describes using Claude Code as the primary design tool — not just for implementation, but for design decisions themselves. Concrete workflow validation from a production engineering context that Claude-as-designer is past the demo stage.
Link →
Radar
dvlt.cu: from-scratch CUDA engine for NVIDIA's 3D transformer
NVIDIA's DVLT 3D transformer model gets a hand-written CUDA/C++ inference engine. 3D transformers are an emerging architecture outside the standard attention stack — dedicated tooling appearing this early is worth watching for builders tracking post-transformer inference.
Link →
MoQ + GSQ: next-gen GGUF quantization, better quality same bits
New quantization methods promising higher quality than current GGUF formats at identical bit widths — still in development but directionally significant. The quantization toolkit is getting a meaningful upgrade that will flow downstream to all local model users.
Link →
Convergence Watch
agent code sandboxing
TRENDING
3 mentions across Simon Willison, GitHub Trending, HN Show
micropython-wasm (Python), Kyushu (JavaScript), and Microsoft MXC (cross-language, enterprise-grade) all surfaced from independent sources on the same day. Agent sandboxing is transitioning from a per-project problem to a dedicated infrastructure layer. This is the inflection signal.
gemma 4 qat
TRENDING
8 mentions across r/LocalLLaMA, HN Front Page
Third consecutive day of heavy coverage. Today's key development: llama.cpp Gemma4 MTP support merged, completing the QAT+MTP stack in stock llama.cpp. No more patched builds — the full performance story is now live on consumer hardware.
kvarn
4 mentions across r/LocalLLaMA, HN Front Page
Second day of validation data, now extending across Qwen 3.6 27B with 75 benchmark pairs. The 1-bit precision advantage is model-agnostic. Two days of cross-model evidence strengthens the adoption case for long-context production deployments.
meta ai account takeover
TRENDING
2 mentions across HN Front Page, r/LocalLLaMA
Four separate days of coverage, Meta has confirmed thousands of accounts compromised. The attack vector — AI support bot as privilege escalation tool via plain-language instructions — is a design failure directly applicable to any chatbot deployment with account-level actions.
STALE: Latent Space newest item is >48h old