The official story of NVIDIA’s flush goes like this.
A group of Chinese researchers released two new large language models — Deepseek V3 and R1 — that mirrored the success of the American AI elite. But they did it at a fraction of the cost and compute, touting a headline of just $6 million (in pre-training cost). That a Chinese firm put out its technology with open-sourced transparency was perhaps an extra kick in the shorts for “OpenAI”, which has long abandoned its lofty altruistic claims.
Of course the news came with skepticism and speculation — Deepseek likely does have access to quite a few forbidden NVIDIA GPUs, and the total cost to build the models is far higher than the headline pre-training figure. Credible estimates suggest Deepseek has at least $500 million of GPUs at their disposal, partially acquired prior to export restrictions. Further, the idea that these are some bootstrapping hobbyists is also not true — while not a household name until recently, the firm has been a leader in the Chinese AI space with an impressive research team that has been watched by insiders for some time.
Even still, there is still little debate that models are impressive. The transparent nature and accompanying research papers make the methods clear and traceable. In short, Deepseek struck a blow to the AI thesis, or more accurately, the AI trade that has dominated U.S. equity markets for the past two years.
Arguably, we can trace the ethos of American AI development back to a paper released by OpenAI researchers five years ago in January 2020, titled Scaling Laws for Neural Language Models. The abstract gets to the point (emphasis added):
[Large Language model] loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range.
The paper suggests that three variables — parameters, data, compute — are the keys to model performance and highly predictive of future model results. They likened the relationship to the natural laws of physics, such as the ideal gas law. The implication was straightforward:
Our results strongly suggest that larger models will continue to perform better, and will also be much more sample efficient than has been previously appreciated. Big models may be more important than big data.
Over the past five years, LLMs have indeed gotten much bigger and better. In practice, bigger means more GPUs, more racks, more interconnects, more electricity. Call it what you want — picks and shovels, infrastructure, datacenters, hardware — these companies have been the biggest beneficiaries of the AI trade to date. NVIDIA leads the pack.
But Deepseek has challenged OpenAI’s version of physics. By making a more efficient training architecture, including utilizing a reinforcement learning feedback loop, the model is able to produce impressive results more efficiently, without human supervision.
To be clear, more efficient LLMs are a positive development for the commercialization of AI, reducing the costs of model development and lowering the barriers to entry for competition. But perhaps the road to AI dominance may not be paved with ever-escalating capex forecasts. The next leg may focus more on architectural and training advancements than brute force. In other words, the infrastructure trade may be souring.
This story reached a fever pitch this past weekend, and the market reaction on Monday was brutal.
Gapping down 12% on the open, NVDA ended the day down 17% — the largest single day decline dating back to the March 2020 COVID crash. In some cases, the pain was even worse for tertiary plays riding the same datacenter thesis. (The four nuclear power plays I highlighted last week — SMR, OKLO, VST, CEG — plunged 20% - 30%, though have recovered some of their losses.)
This is the official story. But it’s not the whole story. It may not even be the main story.