06/08/2026 | Press release | Distributed by Public on 06/08/2026 20:03
Coinbase CEO Brian Armstrong has publicly outlined a pragmatic approach to managing AI costs, recommending, among other things, intelligently routing prompts to cheaper models for most tasks while reserving the most advanced systems for high-value work.
In a post on X on Sunday, Armstrong revealed that Coinbase has been actively optimizing its AI usage by directing the majority of workloads to more affordable models, allowing the company to keep overall costs relatively flat even as token consumption grows exponentially.
"We're working hard on routing prompts to cheaper models where appropriate, and in some cases have been able to keep costs roughly flat, while token usage continues to grow exponentially," he said.
Register for Tekedia Mini-MBA edition 20 (June 8 - Sept 5, 2026).
Register for Tekedia AI in Business Masterclass.
Join Tekedia Capital Syndicate and co-invest in great global startups.
Register for Tekedia AI Lab.
Armstrong predicted that within 12 to 18 months, roughly 80% of workloads could run on models that are 99% cheaper than today's frontier systems. He argued that the latest flagship models, such as Anthropic's Opus 4.8 or OpenAI's GPT-5.5, should be reserved for "IQ maxing" scenarios, including scientific breakthroughs, complex agent orchestration, and other high-stakes applications where maximum intelligence is essential.
"This leads me to think the limiting factor will be energy and compute, not better models," he said.
For much of the past year, the prevailing culture, especially in startups, was one of "tokenmaxxing," where companies proudly showcased massive token usage and rushed to adopt the newest, most powerful models. Y Combinator CEO Garry Tan famously advised founders to "let it rip" with tokens, and many embraced the approach without much regard for cost efficiency.
That era appears to be fading. Armstrong's post struck a chord with several prominent figures. Venture capitalist Marc Andreessen called it "interesting," while Hugging Face co-founder Julien Chaumond noted that "model routing is growing a lot these days." Box CEO Aaron Levie described the 80/99 split as "a bit extreme" but agreed that AI usage would likely stratify, with "high-end" work handled by frontier models and "high-volume" tasks shifted to cheaper alternatives.
Harvey co-founder Winston Weinberg added: "Intelligence allocation is going to be extremely important."
Glean co-founder Tony Gentilcore was more blunt, saying Armstrong's view was "spot on" and that "everyone technical already knows this." He suggested that financial markets were the only ones still extrapolating frontier model prices to infinite scale.
The move toward smarter routing addresses a growing tension in the AI ecosystem. While frontier models deliver impressive capabilities, they come at significantly higher computational and financial costs. When Anthropic released Opus 4.7, many users quickly hit rate limits and complained about rapidly escalating bills. The same pattern has played out across other providers.
By contrast, intelligent routing allows companies to match the right model to the right task - using lighter, faster, and far cheaper systems for routine queries while deploying top-tier models only when necessary. This approach not only controls costs but can also improve speed and user experience for everyday interactions.
For Coinbase, the strategy has clear business implications. As a major financial platform increasingly incorporating AI features, optimizing spend without sacrificing performance is critical for maintaining margins and scaling services. Armstrong implies the company is treating AI as a core infrastructure investment that must be managed with the same discipline as any other operational expense.
Some in the industry believe that this efficiency focus could reshape how the AI market develops. If more companies follow Coinbase's lead, demand for mid-tier and specialized models may grow faster than expected, while pressure on frontier model providers could intensify to deliver better performance-per-dollar.
It also highlights a maturing understanding that raw intelligence isn't the only metric that matters - latency, cost, privacy, and reliability are equally important for real-world deployment.
But the shift also has implications for startups. Some analyst point out that early-stage companies that previously burned capital on token-heavy experimentation may now need to adopt more disciplined approaches earlier in their lifecycle. At the same time, it creates opportunities for new players focused on orchestration, routing layers, and cost-optimization tools.
Armstrong's post arrives at a moment when investor scrutiny of AI return on investment is increasing. With hyperscalers spending hundreds of billions on infrastructure and many enterprises still searching for clear ROI, the emphasis on efficiency and intelligent allocation feels timely.
In the broader AI ecosystem, this is seen as a healthy evolution. The initial gold-rush phase of throwing compute at every problem is giving way to a more nuanced, economically disciplined approach. As Armstrong and others suggest, the real constraint going forward may not be model intelligence but energy, compute availability, and thoughtful deployment.