What happened this week?

DeepSeek’s $6M Myth, Jevon’s Paradox & AI Compute, Not a Bubble Yet, Qwen 2.5VL and Gemini Flash, The Real Cost of AI Training

Feb 03, 2025

For my first ever newsletter, I’m going to be talking about the following:

DeepSeek’s $6M Myth: Is training really that cheap?
Jevon’s Paradox & AI Compute: More efficiency, more demand—are we in a self-perpetuating cycle?
The Real Cost of AI Training: Are compute costs underestimated, or are optimizations closing the gap?
Not a Bubble Yet: Why AI markets haven’t hit speculative excess—yet.
Qwen 2.5VL vs. Gemini Flash: Multimodal battles heat up against OpenAI’s Operator and DeepSeekR1.

DeepSeek and Jevon’s Paradox: Disruption or Inevitable Oversupply

DeepSeek's latest AI models, DeepSeekV3 and DeepSeekR1, have reignited discussions on efficiency gains versus economic viability. Advances in AI have seen costs of inference drop, and as history suggests, leads us to whats known as Jevon’s paradox - when technological advancements make a resource more efficient, paradoxically it fuels higher demand.

Technical Implications of DeepSeek’s Advances

AI Scaling Efficiency: DeepSeek has optimized training pipelines, reducing compute costs significantly. This mirrors the Sputnik moment of 1957, when the USSR launched the first artificial satellite, prompting the U.S. into an accelerated arms race. Second order effects will see startups with minimal CapEx leveraging fine-tuned models and witness a reduced barrier to entry in building AI enabled products.
IP and Security Concerns: OpenAI and Azure may need to scrutinize API endpoint usage to prevent unauthorized model extraction if the claims of large scale distillation are true.
Reinforcement Learning Risks: DeepSeek relies heavily on cold-start RL, which introduces vulnerabilities such as Reward Hacking, leading models to optimize for misleading behaviors or unintended consequences without careful alignment. DeepSeek issues pertaining to AI ethics and safety? There was a video circulated on the Internet where DeepSeek actively censored it responses on the CCP AFTER it briefly wrote about it. Furthermore, adversarial exploits are a serious concern. Models lacking real-world supervision are susceptible to prompt injection attacks and self-reinforcing biases.
Compute Bottlenecks vs. Power Constraints: While GPUs can scale quickly (NVIDIA producing over 1M H20s in 9 months for China), power grid expansion remains a multi-year challenge:
- Gas turbines: 2+ years
- Coal plants: 4-6 years
- Nuclear (e.g., SMRs): 7-10 years
Supply Overshoots Demand: A parallel can be drawn to Dense Wavelength-Division Multiplexing (DWDM) in the early 2000s. Fiber optic efficiency improved exponentially, but overinvestment in network capacity led to massive write-downs and bankruptcies. If AI follows a similar path, we could see significant GPU overcapacity in the coming years.

DeepSeek’s Cost Reality and Future

Semianalysis did an excellent job explaining the real cost of DeepSeek and I summarize ideas from their article below. I’d highly recommend checking out the article for more details.

The Truth Behind DeepSeek’s $6M Training Cost Claim

Reported vs. Actual Cost: While DeepSeek claims a $6M training budget, real-world infrastructure costs exceed $1.6B in CapEx alone, with a considerable cost of $944M associated with operating such clusters.
Hedge Fund Involvement: High-Flyer is the Chinese Hedge fund that owns DeepSeek and early adopters for using AI in their trading algorithm. High Flyer bought 10,000 A100 GPUs in 2021 before any export controls for primarily algorithmic trading. Unlike reported in the news, DeepSeek is NOT a side project. As HighFlyer improved, they realized it was time to spin off DeepSeek
in May 2023 with the goal of pursuing further AI capabilities with more focus.
GPU Access: Semianalysis reports DeepSeek has access to around 10,000 H800s and 10,000 H100s. Moreover, they have ordered more H20’s, with Nvidia having produced over 1 million of the China specific GPU in the last 9 months.
Hardware and Innovation Costs: Semianalysis is confident the hardware spend is well higher than $500M over the company history. “Multi-Head Latent Attention, a key innovation, took several months to develop and cost a whole team of manhours and GPU hours.” The $6M is simply the pertaining cost of the model, which excludes all of the above mentioned costs.
Singapore as a proxy: Before former President Biden’s AI diffusion order was passed into effect, it seems like China had made Singapore a proxy for navigating through the AI export controls, a challenge for the new administration to tackle. Singapore was previously relegated to Tier 2 on the 3-Tier list mandated by Biden’s EO.

DeepSeekV3 Model Innovations

Cold Start RL Training: Inspired by AlphaZero, DeepSeekV3 learns from scratch via self-play.
Rule-Based Rewards: RL without supervised fine-tuning can lead to adversarial exploits.
Emergent Time Thinking: The model employs meta-learning for improved reasoning over time.
GRPO Optimization: An advanced form of Proximal Policy Optimization (PPO), utilizing averaged sampled rewards.

Capital Cycles and AI

As we reflect on the self-perpetuating cycle described by Jevon’s Paradox, it’s clear that the headline figures only tell part of the story. Beneath the surface, the actual cost of AI training is far more complex. This discrepancy invites us to examine whether our current assessments of AI costs truly capture the full scope of investment required.

Current Market Reaction

Key Market Losses and Gains:

Nvidia’s shares tumbled by about 17% in a single day—erasing nearly $593 billion in market capitalization, which stands as the largest one‐day loss recorded for any U.S. company. Other semiconductor stocks suffered similarly, with Broadcom down roughly 17–17.4% and Marvell Technology plunging around 19%. y.
Some recovery followed on Tuesday with Nvidia regaining around $260 billion in market cap.
Meta Platforms benefited indirectly from the open-source nature of DeepSeek’s approach, recording gains of 6.4% over the week (along with Zuckerberg’s $65B AI investment announcement).
Data from several market analytics groups indicated that retail investors ($900 M Nvidia) capitalized on the dip in semiconductor stocks. A noticeable rotation into defensive sectors (such as consumer staples and healthcare ETFs) helped soften the blow across broader indices.

A Historical Perspective

Having delved into the stark contrast between reported training costs and the substantial underlying expenditures, we’re left with a critical question: Is the market overreacting? Although the numbers are dramatic, the broader capital cycle and infrastructure investments suggest that, for now, the AI market remains on solid ground. In other words, despite the recent sell-offs and valuation corrections, we aren’t witnessing a classic speculative bubble—at least not yet.

Investment in AI follows classic capital cycle dynamics:

Excess Capital Chasing Returns: As investors see high AI-driven margins, they overallocate, leading to oversupply of models and infrastructure.
Feedback Loops: Unlike prior bubbles (e.g., railroads, telecom), AI lacks a direct demand feedback loop—scaling doesn’t guarantee proportional revenue growth.

Historical Comparisons

Railroads (1840-1870): Railroads were a network technology, but territorial development incentives created an unsustainable boom. Land grant values skyrocketed which acted as a feedback loop for this bubble.
Telecom Bubble (1990s): Companies fueled stock price surges by announcing network expansion, leading to unsustainable valuations. Equity valuations skyrocketed fueled by debt which acted as a feedback loop for this bubble.
AI Buildout (2022-Present): Unlike previous bubbles, AI lacks an immediate positive feedback loop from investor enthusiasm to underlying value creation. While one can make a case for datacenter and power consumption to be a feedback loop, the bottlenecks highlighted below present a unique bottleneck.

Key Bottlenecks: Power and Data Centers

Compute vs. Power Disparity: AI compute infrastructure (GPUs, networking) can scale relatively quickly (~12 months), whereas power generation lags significantly (~4-6 years for coal, ~2 years for gas turbines).
Amazon's Small Modular Reactor Initiative: Tech companies are now exploring nuclear solutions to bridge this gap, with Amazon investing in SMRs.
Grid Expansion Challenges: Even with increased power generation, regulatory hurdles (eg. FERC regulation for example) and infrastructure buildout timelines create supply constraints.

Strategic Implications for Investors and Enterprises

Short-Term Winners: Cloud providers (AWS, Azure, Google Cloud) benefit from increased AI workload demand.
Long-Term Risks: Overinvestment in AI-specific hardware could lead to significant write-downs if Jevon’s Paradox does not sustain demand.
Regulatory Concerns: Governments may intervene to stabilize AI energy consumption, affecting long-term deployment economics.

Did you miss this?

Even as we acknowledge that the current market turmoil doesn’t necessarily signal an impending bubble, the competitive landscape in AI is evolving at breakneck speed. New models are emerging that challenge established players. For instance, Alibaba’s Qwen 2.5VL is shaking up multimodal AI capabilities, while Google’s Gemini Flash 2.0 Thinking is positioning itself as a cost-efficient contender.

Gemini Flash Thinking vs. R1: Who Wins?

While R1 hype dominates, a $2.5T company in Google released Gemini Flash 2.0 Thinking with superior efficiency:

Cheaper than R1, offering better cost-per-token at scale.
Longer Context Window, benefiting from memory-efficient optimizations.
Limited Benchmarks: Google only released 3 key benchmarks, making direct comparisons incomplete.

Qwen 2.5VL: The Multimodal Challenger

Qwen 2.5VL is Alibaba’s latest vision-language model:

3 sizes (3B, 7B, 72B), enabling video comprehension, document parsing, and object recognition.
Advanced Multimodal Capabilities: Supports structured output & agentic tool use.
Potential Base for Operator-like Agents: Qwen2.5-VL 72B competes with OpenAI’s Operator, with potential enterprise applications.

Final Thoughts: The Next AI Phase

AI cost reductions are accelerating, with estimates of 4x algorithmic efficiency per year (some argue 10x). DeepSeek has lowered costs, but expect another 5x reduction by EOY 2025. The real question: Can demand keep up, or are we headed for overcapacity and a GPU glut?

Sources

Doug O'Laughlin - DeepSeek Is this Jevon’s Cope

Doug O'Laughlin - Capital Cycles and AI

Rohan Paul - Rohan’s bytes
Chamath Palihapitiya - What I read this week
All In Podcast

SemiAnalysis - DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts
Acquired interview with TSMC founder Morris Chang

Arkash’s Substack

Ready for more?