A GitHub repository indexing extracted system prompts from Claude Fable 5, Opus 4.8, Claude Code, GPT-5.5 Thinking, Codex, Gemini 3.5 Flash, Grok, Cursor, Copilot, Perplexity, and two others hit GitHub Trending this week. If you have not worked through it, the index is live now. Read it.
What you will find: carefully engineered interface specifications. Each prompt defines how the model scopes tasks, manages tool-use limits, handles persona definition, and marks what it will and will not do. The structural patterns across vendors are instructive. How coding assistants define the scope of file operations. How different vendors draw the line between proactive action and requiring explicit instruction. How persona boundary enforcement varies between general-purpose assistants and task-specific ones. How the labs handle the edge between helpfulness and restraint. These are design decisions you can study and adapt to your own agent system prompts.
What the prompts open onto is the interface layer. The behavior you care about in production, the reliability on edge cases, the ceiling on ambiguous instructions, the failure modes under load, lives in the training: the data the model learned from, the feedback processes used to shape its responses, the model weights. A prompt can instruct a model to be conservative with file operations. The training determines whether that instruction produces conservative behavior at the edge cases that matter to your pipeline. Operators looking to these prompts for capability explanations will find interface design instead. Both layers matter. They are not the same thing.
In 1998-99, I was running product at GoAuctions, and we shipped Buy It Now before eBay did. Disney memorabilia seeded the catalog; a Christie's partnership anchored the high end. On paper, we had a better product, earlier, with stronger launch partners than the market leader. Buyers went where the listings were. Sellers went where the buyers were. eBay had the network and we had a visible product layer that did not address the underlying structural problem. The feature was real. The moat was not there.
The prompt index lands in the same position. Seeing how a frontier lab structures its tool-use limits gives you useful reference architecture for your own agent system prompts. It does not tell you why any specific model handles ambiguous instructions the way it does, where the capability ceiling actually sits, or what failure modes you will encounter at production load. Those answers live in the training. The prompt is the interface. The interface is not what makes the model work.
The same week, Patrick McCanna's analysis argues that the text displayed in Claude Code's Extended Thinking mode is post-hoc rationalization rather than the actual compute path the model used. The reasoning text, McCanna argues, is produced alongside the answer rather than before it. The model did not think and then explain; it answered and then generated a coherent narrative of the reasoning. If you have been using Extended Thinking output as a debug signal, or if you have built explainability infrastructure on the assumption that those traces are faithful accounts of what the model actually did, this analysis is worth working through carefully. The traces are structured summarization. They are not a step-by-step record of model behavior.
Two signals in the same week, pointing to the same structural observation: the output surfaces AI vendors give operators are designed for usability and coherence. They are not windows into model internals. This is not a criticism of any specific vendor. It is a property of how these systems work and what is technically possible to surface to users. The system prompt is the configuration interface. Extended Thinking output is the explanation interface. Neither is the model.
For operators using Extended Thinking as a debugging mechanism: the immediate question is whether your debugging conclusions have been drawing on genuine reasoning traces or on coherent-sounding narratives that describe the answer correctly. The distinction matters when your agent pipeline fails in ways the extended thinking output appeared to explain but the root cause was elsewhere. Treat the traces as structured output to evaluate against results, not as ground truth about model decision-making.
For operators using the leaked system prompts as reference: the useful move is to study the interface design patterns. The structural decisions worth extracting are around task scope definition, tool-use limit enforcement, and how each vendor draws the boundary between autonomous action and explicit instruction. Those patterns are yours to adapt. The model behavior that sits behind those prompts is not accessible from this layer.
McCanna's analysis ends on a specific point: the Extended Thinking text is generated in the same pass as the output. The teams who wrote the leaked system prompts knew this. None of them ask the model to narrate its reasoning as a primary task. They ask it to define scope, manage limits, and stay in role. That design choice, consistent across all eleven systems in the index, reflects a clear understanding of what the interface layer can and cannot do. The prompts are useful because their authors already held the distinction. Reading them with that frame in mind is how you get the most out of them.