Ornith-1.0 brings self-scaffolding agents; DeepSeek V4 lands in llama.cpp; Qwen 3.6 27B is the new local consensus.
Top Signal
Ornith-1.0: MIT model that rewrites its own agent scaffolding
research to practice
Simon Willison, HN Front Page, r/LocalLLaMA
DeepReinforce shipped Ornith-1.0, an MIT-licensed family (9B Dense, 35B, 397B MoE) trained specifically for agentic coding via a self-scaffolding approach — the model generates and revises its own tool-call wrapper during a session rather than being prompted into a fixed loop. Simon Willison covered it, HN gave it 133 pts, and r/LocalLLaMA already has GGUF serving configs. One user reports 30–40% token-gen speedup by pairing Ornith-35B as target with Qwen3.6-35B as speculative draft via DFlash. The self-scaffolding framing is architecturally distinct from instruction-following agents: the model is trained on its own scaffolding improvement as a task, not just code generation. Start with the 9B to stress-test the claims; the 35B + Qwen3.6 speculative config is the practical entry point for dual-GPU setups. First release from a previously unknown lab — watch for follow-up evals.
Read more →
Fast Signals
Qwen 3.6 27B hits 537-pt HN post as community local consensus
emerging signal
HN Front Page, r/LocalLLaMA
A Quesma post arguing Qwen 3.6 27B is the sweet spot for local dev hit HN with 537 points and 470 comments — the strongest community validation signal in weeks. r/LocalLLaMA backs it with real configs: Q3 quant + KV at Q8 runs on 24GB VRAM; tensor split across dual 4090s over TB3 yields 26 t/s TG. If you're still on Qwen 3.0 or Llama 3.3, this is your upgrade signal.
Link →
DeepSeek V4 PR merged into llama.cpp mainline
platform change
r/LocalLLaMA
PR #24162 landed, making DeepSeek V4 runnable via standard llama.cpp. Steps: git pull, cmake, download GGUFs. Multiple r/LocalLLaMA users confirmed it's live today. If you run llama.cpp-based inference, this is a same-day upgrade.
Link →
video-use: agentic video editing via Claude Code, fully open source
new tool
GitHub Trending
The browser-use team shipped video-use — a coding-agent interface for video editing built on Claude Code, 100% open source. This is the first serious attempt at agent-driven video editing (not prompt-to-video generation). The architecture — agent + video manipulation API — is reusable for content pipeline builders.
Link →
Strix: open-source AI agent for app vulnerability scanning
new tool
GitHub Trending
Strix is an open-source AI agent harness that finds and fixes application vulnerabilities, now on GitHub Trending. Unlike static analyzers, it's agent-driven — it plans, executes, and iterates on attack paths. Actionable for anyone shipping web APIs who wants automated red-teaming in CI.
Link →
Bash4LLM+: call LLM APIs from bash with zero runtime deps
new tool
HN Show
Single-file bash wrapper for LLM APIs using only curl and jq — no Python, no Node. Handles prompt sending, basic chat, and file-line processing. Drop it in your PATH and you can embed LLM calls in cron jobs, CI pipelines, or shell scripts with no environment overhead.
Link →
LongCat-2.0: stealth 1.6T MoE 'owl-alpha' officially released
emerging signal
r/LocalLLaMA
LongCat-2.0 is a 1.6T total parameter MoE (48B activated per token) that was quietly serving as the unlabeled 'owl-alpha' on OpenRouter. Now formally announced. Not locally runnable for most, but signals an emerging pattern: large MoEs deployed anonymously for community evaluation before reveal.
Link →
Radar
Bolt Graphics GPU: user-swappable DDR5 VRAM slots
Bolt Graphics is designing a GPU with two standard DDR5 laptop DIMM slots, making VRAM user-upgradeable post-purchase. If it ships, this breaks the fixed-VRAM constraint that currently forces local LLM runners to over-spec at buy time — a structural change to the local inference hardware market worth tracking.
Link →
LingBot-Map: feed-forward streaming 3D reconstruction
A foundation model that reconstructs 3D scenes from streaming video data in a single forward pass — potential primitive for embodied agents or spatial AI pipelines. Too early for most product builders, but the 'streaming 3D context' capability is novel and not yet commoditized.
Link →
Convergence Watch
ornith-1.0
3 mentions across Simon Willison, HN Front Page, r/LocalLLaMA
Three independent sources on day one: Simon Willison wrote it up, HN gave 133 pts, r/LocalLLaMA has benchmarks and serving configs with speculative decode already working. Unusually fast cross-source validation for a model from a previously unknown lab. The self-scaffolding claim is the differentiator — watch how the community stress-tests it over the next 48 hours before treating it as production-ready.
glm-5.2
TRENDING
5 mentions across HN Front Page, r/LocalLLaMA
GLM 5.2 has appeared in 6 consecutive briefing days with rising source counts. Today Semgrep published benchmark data showing it outperforms Claude on production cybersecurity tasks, and r/LocalLLaMA has independent Q1_S vs Qwen27B comparisons and CPU-only Epyc runs. The cybersecurity angle is new and specific — if you're building security tooling, this warrants a direct evaluation.
qwen 3.6 27b
4 mentions across HN Front Page, r/LocalLLaMA
HN's highest-voted post today plus multiple independent r/LocalLLaMA serving configs converge on Qwen 3.6 27B as the current local dev default. This is the type of community crystallization that precedes a model becoming the baseline for local benchmarks and the default recommendation in tutorials.
llama.cpp
3 mentions across r/LocalLLaMA
DeepSeek V4 support merged today, Ornith-35B GGUF confirmed working with DFlash speculative decode, and Qwen3-TTS.cpp surfaced as a new consumer. llama.cpp mainline is absorbing major model support at an accelerating rate — pinning to a specific version is increasingly costly; staying on main is the right default.
STALE: Latent Space newest item is >48h old