The Steering Tax
Why the most capable model is often the cheapest path to working code
I’ve always defaulted to the top-tier model from whatever lab I’m using. Anthropic, Google, OpenAI. Not out of any special insight. It just felt like the right trade. Lately I’ve been trying to articulate why.
A note on terminology: every frontier lab structures their lineup differently, but for this piece I’ll use prime to mean the most capable model available (Opus 4.5, GPT-5.2, Gemini 3) and secondary for the next tier down, whether that’s a smaller variant or a previous generation. The distinction matters more than the specific names.
The pattern I keep seeing: developers three corrections deep, context filling with failed attempts, each response worse than the last. They’re not coding anymore. They’re stuck in an endless loop of re-prompting and reviewing. The secondary model is costing them hours.
The industry benchmarks measure token generation speed. But in agentic coding, the real cost isn’t generation. It’s steering. The human time spent course-correcting, re-prompting, recovering from false starts. That’s the tax. And it compounds.
For novel work, I want the prime model as orchestrator. Let the tooling (Claude Code, Cursor, whatever harness you’re using) decide when to delegate subtasks to secondary models. But the intelligence at the top of the decision tree should be the best available.
My sense is that the December 2025 generation of prime models widened the capability gap enough to change the calculus for more developers. As early as 2024, Simon Willison described this as crossing “an invisible capability line.” Whether that’s a true inflection point or just my experience, I can’t say for certain. But the ROI case has never been easier to make.



