Mozilla weaponized Claude Mythos against its own codebase — and found hundreds of real vulnerabilities.
Top Signal
Mozilla Used Claude Mythos to Find Hundreds of Real Firefox Vulnerabilities
research to practice
Simon Willison, HN Front Page
Mozilla published a detailed case study on using their early access to Claude Mythos to systematically audit the Firefox codebase for security vulnerabilities — and found hundreds of real, fixable bugs. This isn't another 'AI finds toy bugs in CTF challenges' story. They ran it against production C++ at scale and the hit rate was high enough to change their security workflow. The technique is straightforward: feed large codebases to a model with sufficient context window and domain knowledge, then triage the output. The key insight is that LLM-powered auditing works best not as a replacement for fuzzing or static analysis, but as a complementary pass that catches logic-level vulnerabilities those tools miss. If you maintain any substantial codebase, this is the strongest evidence yet that LLM security auditing is production-ready. Bookmark the post for the specific prompting patterns they used.
Read more →
Fast Signals
antirez Ships ds4: DeepSeek 4 Flash Inference Engine for Metal
new tool
HN Front Page
Salvatore Sanfilippo (Redis creator) released ds4, a purpose-built local inference engine for running DeepSeek 4 Flash on Apple Silicon via Metal. 253 HN points and active discussion. If you're running DeepSeek models locally on Mac, this is worth benchmarking against llama.cpp — antirez's track record suggests it's optimized for the specific architecture rather than being a general-purpose runtime.
Link →
"Agents Need Control Flow, Not More Prompts" Hits 280 Points on HN
workflow
HN Front Page
A blog post arguing that the agent reliability bottleneck is architectural, not prompt-level, resonated hard with builders. The core thesis: treat agent orchestration as a programming problem (explicit state machines, retry logic, branching) rather than trying to prompt your way to reliability. If you're debugging flaky agents, start here before adding another system prompt paragraph.
Link →
Agent Skills Pattern Emerges Across GitHub Trending and HN Show
emerging signal
GitHub Trending, HN Show
Two independent projects landed simultaneously: addyosmani/agent-skills (production-grade engineering skills for coding agents, trending on GitHub) and agent-skills-eval on HN Show (a framework to test whether agent skills actually improve outputs). The convergence suggests 'skills' — reusable, testable capability modules for agents — is crystallizing as a design pattern. Worth studying both repos if you're building agent tooling.
Link →
Anthropic Publishes Natural Language Autoencoders Research
research to practice
HN Front Page
New Anthropic research on extracting Claude's internal reasoning as readable text — not just chain-of-thought, but compressed representations of what the model 'knows' at each layer. This is interpretability research that may eventually let you debug why your agent made a bad decision. Not actionable today, but if you care about model internals, this is the paper to read.
Link →
Dirtyfrag: Universal Linux Local Privilege Escalation — Patch Now
platform change
HN Front Page
A new universal Linux LPE exploit hit oss-security with 325 HN points. If you're running any Linux servers (inference boxes, deploy targets, CI runners), check your kernel version and patch. This affects the networking stack and is reportedly reliable across distributions.
Link →
AlphaEvolve: DeepMind's Coding Agent Scales Across Research Domains
research to practice
HN Front Page
Google DeepMind published results on AlphaEvolve, a Gemini-powered coding agent that generates and evolves code solutions across math, science, and engineering. 235 HN points. The interesting bit for builders isn't the agent itself but the evolutionary search strategy — generate many candidates, evaluate programmatically, evolve winners. Useful pattern for any domain where you can define a fitness function.
Link →
Radar
InsForge: Postgres Backend Built for Coding Agents
A new Postgres-based backend bundling auth, storage, compute, hosting, and AI gateway specifically designed for coding agents. Early-stage but addresses the real pain of agents needing infrastructure primitives without manual setup.
Link →
DeerFlow 2.0: ByteDance's Long-Horizon Agent Harness
ByteDance open-sourced v2.0 of DeerFlow, a SuperAgent harness with sandboxes, memory, sub-agents, and skill systems for tasks that take minutes to hours. Worth watching as a reference architecture for complex multi-step agent systems.
Link →
Convergence Watch
local agentic coding
TRENDING
3 mentions across HN Front Page, GitHub Trending, HN Show
Day 6 of sustained cross-source signal. Today's ds4 (Metal-native DeepSeek inference) plus agent-skills repos show the stack maturing: local models are getting dedicated runtimes while agent tooling gets standardized skill interfaces. The gap between cloud and local agent capability continues narrowing.
agent skills pattern
3 mentions across GitHub Trending, HN Show, Simon Willison
New convergence today. addyosmani/agent-skills, agent-skills-eval, and Anthropic's own financial-services skills repo all landed the same day. 'Skills' as reusable agent capability modules is solidifying as a shared vocabulary across the ecosystem.
qwen 3.6
TRENDING
2 mentions across r/LocalLLaMA, GitHub Trending
Day 7 of sustained presence. The model has moved from benchmarks to production tooling — community is now building specialized runtimes and quantizations rather than debating scores. Adoption phase, not hype phase.
multi-token prediction
TRENDING
2 mentions across HN Front Page, GitHub Trending
Day 3 across 4+ sources. MTP is transitioning from research curiosity to shipping feature — llama.cpp beta support, Gemma draft models, and Qwen MTP grafting all converging. If you serve local models, MTP support is becoming table stakes.
SOURCE DOWN: r/LocalLLaMA returned 0 items