Enterprise Strategy

The Inference Price Floor Just Moved. What's Your Model Routing Strategy Now?

May 29, 2026

DeepSeek's V4-Pro now costs $0.87 per million output tokens. Permanently.

The Inference Price Floor Just Moved. What's Your Model Routing Strategy Now?

Credit:

If this caught your attention, that’s not accidental. 

The best editorial systems don’t happen by accident. Outlever builds them.

See how it works

DeepSeek's V4-Pro now costs $0.87 per million output tokens. Permanently. According to DeepSeek's updated pricing page, as reported by Engadget, that is roughly 11x cheaper than Claude Opus 4.7 at $5/$25 per million tokens and well below GPT-5.5 at comparable coding and reasoning benchmarks. This is not a promotional loss leader. As InfoWorld noted, the company says V4-Pro was "engineered to cut the cost of long-context inference," running at roughly a quarter of the single-token compute and a tenth of the memory footprint of its predecessor at long context. The 75% discount that was supposed to expire May 31 is now the standard rate.

The pricing conversation for frontier AI inference just reset, and every enterprise team running multi-model infrastructure needs to react.

What actually changed

Until last week, DeepSeek's discount on V4-Pro looked like a customer acquisition play. Temporary, aggressive, designed to pull developers onto the platform before reverting to sustainable margins. Making it permanent sends a different signal. DeepSeek is pre-committing to a cost structure that assumes Huawei's Ascend 950 supernodes arrive in volume in H2 2026. Per WinBuzzer's reporting, Huawei aims to ship around 750,000 Ascend 950PR units during 2026. If that shipment lands, DeepSeek's cost basis drops further. If it doesn't, the company is eating margin to hold position.

The pricing now ranges from $0.003625 per million tokens on cached input to $0.87 per million on output, per Engadget. A production coding agent that loops through long-context windows all day, the exact workload enterprises are scaling right now, runs at roughly a quarter of what the same throughput costs on Anthropic or OpenAI APIs.

V4-Pro supports a 1-million-token context window and a 384K maximum output limit. As The Next Web reported, it integrates natively with Claude Code, OpenClaw, and OpenCode, the three dominant agentic coding frameworks in the Western developer ecosystem. DeepSeek did not build a walled garden, instead It built a drop-in replacement.

Why this matters beyond DeepSeek

Most enterprise AI teams will never deploy DeepSeek in production. Data-residency constraints, compliance requirements, and the geopolitical overhang of routing sensitive workloads through a Chinese provider are real barriers that no pricing model eliminate…but that misses the point.

DeepSeek has just established a visible price floor that did not exist a month ago. That floor changes three dynamics simultaneously.

First, procurement leverage. Every enterprise renegotiating an Anthropic, OpenAI, or Google contract in Q3 now has a concrete reference price. "Explain why we are paying 11x more per token for comparable benchmark performance" is a question procurement teams will ask regardless of whether they would actually switch. The negotiating dynamics shifted even for teams that never touch DeepSeek.

Second, model routing architecture. Teams running multi-model infrastructure, and most serious enterprise deployments now do, need to reassess their routing logic. The cost-quality curve bent again this week. Workloads that are not bound by data-residency rules (internal code generation, document summarization, batch processing) now have a dramatically cheaper option. The routing calculus that was optimal last month is already stale.

Third, pressure on US providers. Anthropic and OpenAI have two responses available: cut Sonnet and GPT-5.5 Mini pricing to close the gap, or argue capability hard enough to justify the premium. Expect movement on both fronts within 60 days. Google already positioned Gemini 3.5 Flash at $1.50/$9 per million tokens at I/O 2026, a middle ground that undercuts the US frontier labs but does not match DeepSeek's floor.

The strongest case against overreacting

Benchmark performance is not production performance. Total cost of ownership includes reliability, support, compliance, and integration depth, not just per-token rates. An enterprise running mission-critical inference on a provider with opaque infrastructure, Huawei-dependent compute, and no SLA that holds up in a US court is taking on risk that never appears in a token-price comparison.

Fair. But this argument gets weaker every quarter. DeepSeek's integration into Western agentic frameworks means the switching cost is increasingly just a URL change in an API config. And the teams most sensitive to inference cost, the ones running millions of agent loops per day, are exactly the teams where a 10x price difference compounds into real budget pressure.

What to do this week

If you are renewing an AI provider contract in the next 90 days, build a comparison table with DeepSeek V4-Pro pricing as the floor. You do not need to threaten a switch. You need your provider to know you understand what the market rate is.

If you are running multi-model routing, audit which workloads currently go to your most expensive model by default. Identify the subset where data-residency and compliance constraints do not apply. Model that subset at DeepSeek's price point and quantify the annual savings. That number is your business case for investing in smarter routing logic, even if you route to a cheaper tier from your current provider instead.

If you are budgeting for AI infrastructure in 2027, stop modeling inference costs as stable. The floor is moving down faster than most financial models assume. Build in a 40 to 60 percent annual decline in per-token costs and plan your architecture to capture that deflation, not get locked out of it.

The price war is not coming, it is here. The question is whether your routing strategy and vendor contracts reflect the market as it exists today or the market as it existed six months ago.