A single 3090 just hit 95.7% on SimpleQA with fully local agentic search.
Top Signal
Qwen3.6-27B + Agentic Search Hits 95.7% SimpleQA on a Single 3090
workflow
r/LocalLLaMA
LDR (Local Deep Research) maintainer reports that Qwen3.6-27B paired with their agentic search framework now scores 95.7% on SimpleQA running entirely on a single RTX 3090. This is a frontier-grade factual accuracy score achieved with zero API costs. The setup combines Qwen3.6-27B's strong reasoning with iterative web search and self-verification loops — the model searches, reads, cross-references, and answers. For builders: this crosses the threshold where local agentic RAG can replace cloud API calls for many factual retrieval tasks. If you're building agents that need to answer questions accurately and you have a 24GB GPU, this stack is now viable for production. The combination of Qwen3.6's instruction-following quality at 27B parameters with structured agentic scaffolding is what makes this work — the scaffold matters as much as the model.
Read more →
Fast Signals
Open Design: Turn Your Coding Agent into a Design Engine
new tool
HN Front Page
Open-source tool on GitHub (162 HN points) that lets you use coding agents like Claude Code or Codex as design tools — generating UI components, layouts, and visual assets from natural language specs. If you're a solo dev or small team shipping products without a designer, this bridges the gap between 'I can describe what I want' and 'I have a working UI.'
Link →
Unsloth Fixes Broken Mistral Medium 3.5 GGUFs — Re-download Now
platform change
r/LocalLLaMA, r/LocalLLaMA
All Mistral Medium 3.5 128B GGUFs were producing degraded outputs, especially at long context. Unsloth worked with Mistral to identify and fix the bug, and updated quants are now available. If you downloaded Mistral Medium 3.5 GGUFs before May 1, re-download — your outputs were silently wrong.
Link →
Qwen 3.6 Wins Benchmarks, Gemma 4 Wins Reality on Vision Tasks
research to practice
r/LocalLLaMA
Head-to-head local testing of Qwen3.6-27B vs Gemma4-31B on vision tasks using vLLM with FP8 reveals Gemma 4 outperforms on real-world visual understanding despite lower benchmark scores. The tester documents 7 specific findings and flags 'benchmaxing' as a real phenomenon. If you're choosing a local vision model, test on YOUR data — benchmark rankings are misleading for vision.
Link →
Agent-Desktop: Native Desktop Automation CLI for AI Agents
new tool
HN Show
Open-source CLI that gives AI agents native desktop control — clicking, typing, screenshots, window management — without browser automation overhead. Unlike Anthropic's CUA or browser-based approaches, this works at the OS level. Useful if you're building agents that need to interact with native apps, not just browsers.
Link →
VS Code Inserting 'Co-Authored-by Copilot' Even When Copilot Wasn't Used
platform change
HN Front Page
A PR with 524 HN points reveals VS Code was silently adding Copilot co-author attribution to git commits regardless of whether Copilot generated any code. Beyond the trust issue, this matters for builders shipping to clients or regulated environments where AI-attribution has contractual or compliance implications. Check your recent commit history if you use VS Code.
Link →
Hybrid On-Device Android Inference: llama.cpp + LiteRT + NPU Routing
workflow
r/LocalLLaMA
New approach for mobile inference that routes layers between llama.cpp (CPU/GPU) and LiteRT (NPU) on Android devices. This hybrid architecture lets you split model execution across hardware accelerators on-device. If you're building local-first mobile AI apps, this is the first practical multi-backend routing pattern for Android.
Link →
Radar
Flare-TTS 28M: Tiny TTS Model Trained from Scratch
A 28M parameter text-to-speech model trained on a single A6000 in 24 hours. At this size it could run on microcontrollers or phones. Worth watching if you need ultra-lightweight voice synthesis for IoT or embedded use cases.
Link →
Loopsy: Cross-Machine Agent Communication via Terminals
Lets terminals and AI agents on different machines coordinate — run commands, transfer files, delegate coding agent work across your hardware. Early-stage but addresses the real problem of multi-machine agent orchestration without cloud infrastructure.
Link →
Warpdrv: llama.cpp Launcher for Strix Halo + RTX
Open-source launcher purpose-built for daily-driving Qwen 35B/27B on AMD Strix Halo with RTX Pro hybrid setups. Signals the maturing AMD+NVIDIA mixed-hardware local inference workflow.
Link →
Convergence Watch
qwen 3.6
TRENDING
8 mentions across r/LocalLLaMA, HN Front Page, GitHub Trending
Day 7 of sustained multi-source coverage. Today's signal shifts from 'impressive benchmarks' to practical deployment patterns: agentic search scaffolds, vision model comparisons, Windows vLLM launchers, and KV cache optimization. Qwen3.6-27B is becoming the default local model for agentic workloads.
mistral medium 3.5
TRENDING
3 mentions across r/LocalLLaMA, r/LocalLLaMA
Day 3 of coverage, today focused on the critical GGUF bugfix. The broken-quant incident is a cautionary tale: Mistral Medium 3.5 was silently producing degraded outputs for days. Unsloth's fix is now live. Re-download if you pulled GGUFs before May 1.
local agentic coding
TRENDING
4 mentions across r/LocalLLaMA, HN Show, HN Front Page
Multiple threads today about running coding agents locally — KV cache quantization tradeoffs for agent workloads, agent-desktop for native OS control, open-design for agent-driven UI generation. The ecosystem is shifting from 'can local models do this?' to 'how do I optimize my local agent stack?'
STALE: Latent Space newest item is >48h old